More about HKUST
Efficient Frequent Pattern Mining over Probabilistic Databases
PhD Thesis Proposal Defence Title: "Efficient Frequent Pattern Mining over Probabilistic Databases" by Mr. Yongxin TONG ABSTRACT: With the broad usage of Internet of Things (IoT) and pervasive computing techniques in the modern society, a growing number of real-time data monitoring systems lead to the uncertainty in massive collected data. For example, data integration of multiply data sources causes uncertainty. Thus, mining probabilistic data (or called uncertain data in this thesis) has become a hot research topic in recent years. In particular, since mining frequent patterns is one of the most fundamental problems in traditional data mining researches, mining frequent patterns over probabilistic databases has attracted much attention in the database and the data mining communities. In the scenario of uncertain data, the support of an itemset is a discrete random variable rather than the frequency of this itemset. Hence, unlike the corresponding problem in deterministic databases where the frequent itemset has a unique definition, the frequent itemset under uncertain data has different definitions due to variation of probabilistic semantics, which even generate inconsistent results in current studies. However, the relationship of different definitions and the inconsistent result has not yet been thoroughly identified and explored. Furthermore, like its counterpart, mining frequent patterns in deterministic data, mining frequent patterns over uncertain data cannot avoid an exponential number of frequent itemsets which causes the mining results less useful. In this thesis, we demonstrate how our solution can clarify the problems in existing studies and address the challenge of an exponential number of mining results. Moreover, our solution is also well applied to constructing effective indexes for query processing over other types of uncertain data, i.e., querying uncertain graphs. To summarize, our study covers the following three aspects: 1) We conduct a comprehensive experimental study of existing representative frequent itemset mining algorithms over probabilistic databases and clarify several existing inconsistent conclusions; 2) We propose a novel problem of mining probabilistic frequent closed itemsets in uncertain databases and design an efficient solution, which includes a series of pruning techniques and an effective sampling algorithm. 3) We study the problem of efficient probabilistic supergraph containment query and provide the efficient solution, which integrates probabilistic frequent pattern mining technique for constructing the index. We validate our solutions through extensive experiments and discuss several potential research directions of mining frequent patterns over probabilistic databases. Date: Thursday, 11 July 2013 Time: 10:00am - 12:00noon Venue: Room 3501 lifts 25/26 Committee Members: Dr. Lei Chen (Supervisor) Prof. Frederick Lochovsky (Chairperson) Dr. Raymond Wong Dr. Ke Yi **** ALL are Welcome ****