More about HKUST
Effective Discovery of Order-Preserving Submatrices
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
PhD Thesis Defence
Title: "Effective Discovery of Order-Preserving Submatrices"
By
Miss Qiong Fang
Abstract
As matrix is a common data presentation in many applications, the
submatrix mining problem has been extensively studied to reveal
interesting correlations among a set of rows and a set of columns. Various
submatrix models have been proposed to describe different local
correlations, among which, the order-preserving submatrix (OPSM) model is
proposed to capture the fact that the entry values of a set of rows follow
the same trend under a set of columns. In this thesis, we systematically
study the problem of mining OPSM patterns to address various real
application needs.
In gene expression analysis, as the OPSM model is regarded to be
restrictive and not practical when sample contamination inevitably exists,
we consider two relaxation strategies to relax the strict OPSM model.
Three relaxed OPSM models and the corresponding pattern mining methods
have been proposed. Experimental studies on real biological data show that
our relaxed OPSM models lead to the discovery of interesting correlations
with high biological significance. The experiments also justify the
efficiency of our proposed pattern mining methods.
Due to the prevalence of uncertain data, we define new probabilistic
matrix representations, and formalize a novel probabilistic OPSM (POPSM)
model. We propose an efficient POPSM mining framework called ProbApri. We
demonstrate the superiority of our approach by two applications of POPSM.
Experiments on biological datasets show that the POPSM model better
captures the characteristics of the expression levels of biologically
correlated genes. Experiments on a RFID trace dataset demonstrate the
effectiveness of our POPSM model in capturing the common visiting behavior
among users.
Data in applications like recommender systems can also be organized as
matrices, which however usually have extremely large size and high degree
of sparsity. We propose a sparse OPSM (SOPSM) model for sparse matrices,
and adopt two criteria to control the number and distribution of patterns
mined from large matrices. Accordingly, we formulate a constrained SOPSM
mining problem, and develop an effective approach called Quick Probing
(QP) and its parallelization scheme. Experiments on two datasets from real
recommender systems show that the QP framework achieves high scalability
and is efficient for handling matrices having millions of rows and
columns. Besides, QP predicts the ordered preference of users over a set
of items with high accuracy.
Date: Thursday, 15 March 2012
Time: 10:00am – 12:00noon
Venue: Room 3401
Lifts 17/18
Chairman: Prof. Elizabeth George (MGMT)
Committee Members: Prof. Wilfred Ng (Supervisor)
Prof. Lei Chen
Prof. Qiang Yang
Prof. Weichuan Yu (ECE)
Prof. Chi-Ming Kao (Comp. Sci. & Inf. Sys., HKU)
**** ALL are Welcome ****