More about HKUST
Efficient Frequent Pattern Mining over Probabilistic Databases
PhD Thesis Proposal Defence
Title: "Efficient Frequent Pattern Mining over Probabilistic Databases"
by
Mr. Yongxin TONG
ABSTRACT:
With the broad usage of Internet of Things (IoT) and pervasive computing
techniques in the modern society, a growing number of real-time data
monitoring systems lead to the uncertainty in massive collected data. For
example, data integration of multiply data sources causes uncertainty.
Thus, mining probabilistic data (or called uncertain data in this thesis)
has become a hot research topic in recent years. In particular, since
mining frequent patterns is one of the most fundamental problems in
traditional data mining researches, mining frequent patterns over
probabilistic databases has attracted much attention in the database and
the data mining communities. In the scenario of uncertain data, the
support of an itemset is a discrete random variable rather than the
frequency of this itemset. Hence, unlike the corresponding problem in
deterministic databases where the frequent itemset has a unique
definition, the frequent itemset under uncertain data has different
definitions due to variation of probabilistic semantics, which even
generate inconsistent results in current studies. However, the
relationship of different definitions and the inconsistent result has not
yet been thoroughly identified and explored. Furthermore, like its
counterpart, mining frequent patterns in deterministic data, mining
frequent patterns over uncertain data cannot avoid an exponential number
of frequent itemsets which causes the mining results less useful. In this
thesis, we demonstrate how our solution can clarify the problems in
existing studies and address the challenge of an exponential number of
mining results. Moreover, our solution is also well applied to
constructing effective indexes for query processing over other types of
uncertain data, i.e., querying uncertain graphs. To summarize, our study
covers the following three aspects:
1) We conduct a comprehensive experimental study of existing
representative frequent itemset mining algorithms over probabilistic
databases and clarify several existing inconsistent conclusions;
2) We propose a novel problem of mining probabilistic frequent closed
itemsets in uncertain databases and design an efficient solution, which
includes a series of pruning techniques and an effective sampling
algorithm.
3) We study the problem of efficient probabilistic supergraph containment
query and provide the efficient solution, which integrates probabilistic
frequent pattern mining technique for constructing the index.
We validate our solutions through extensive experiments and discuss
several potential research directions of mining frequent patterns over
probabilistic databases.
Date: Thursday, 11 July 2013
Time: 10:00am - 12:00noon
Venue: Room 3501
lifts 25/26
Committee Members: Dr. Lei Chen (Supervisor)
Prof. Frederick Lochovsky (Chairperson)
Dr. Raymond Wong
Dr. Ke Yi
**** ALL are Welcome ****