More about HKUST
Effective Discovery of Order-Preserving Submatrices
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Effective Discovery of Order-Preserving Submatrices" By Miss Qiong Fang Abstract As matrix is a common data presentation in many applications, the submatrix mining problem has been extensively studied to reveal interesting correlations among a set of rows and a set of columns. Various submatrix models have been proposed to describe different local correlations, among which, the order-preserving submatrix (OPSM) model is proposed to capture the fact that the entry values of a set of rows follow the same trend under a set of columns. In this thesis, we systematically study the problem of mining OPSM patterns to address various real application needs. In gene expression analysis, as the OPSM model is regarded to be restrictive and not practical when sample contamination inevitably exists, we consider two relaxation strategies to relax the strict OPSM model. Three relaxed OPSM models and the corresponding pattern mining methods have been proposed. Experimental studies on real biological data show that our relaxed OPSM models lead to the discovery of interesting correlations with high biological significance. The experiments also justify the efficiency of our proposed pattern mining methods. Due to the prevalence of uncertain data, we define new probabilistic matrix representations, and formalize a novel probabilistic OPSM (POPSM) model. We propose an efficient POPSM mining framework called ProbApri. We demonstrate the superiority of our approach by two applications of POPSM. Experiments on biological datasets show that the POPSM model better captures the characteristics of the expression levels of biologically correlated genes. Experiments on a RFID trace dataset demonstrate the effectiveness of our POPSM model in capturing the common visiting behavior among users. Data in applications like recommender systems can also be organized as matrices, which however usually have extremely large size and high degree of sparsity. We propose a sparse OPSM (SOPSM) model for sparse matrices, and adopt two criteria to control the number and distribution of patterns mined from large matrices. Accordingly, we formulate a constrained SOPSM mining problem, and develop an effective approach called Quick Probing (QP) and its parallelization scheme. Experiments on two datasets from real recommender systems show that the QP framework achieves high scalability and is efficient for handling matrices having millions of rows and columns. Besides, QP predicts the ordered preference of users over a set of items with high accuracy. Date: Thursday, 15 March 2012 Time: 10:00am – 12:00noon Venue: Room 3401 Lifts 17/18 Chairman: Prof. Elizabeth George (MGMT) Committee Members: Prof. Wilfred Ng (Supervisor) Prof. Lei Chen Prof. Qiang Yang Prof. Weichuan Yu (ECE) Prof. Chi-Ming Kao (Comp. Sci. & Inf. Sys., HKU) **** ALL are Welcome ****