Non-negative Data Representation in Vision and Bioinformatics

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Non-negative Data Representation in Vision and Bioinformatics"

By

Miss Qing LIAO


Abstract

Over the past decades, computer vision and computational biology are two hot 
topics in nowadays big data analysis. Both domains have large amount of 
non-negative data. For example, in computer vision, the pixel values of images 
and the frames of videos are both non-negative. In computational biology, the 
gene expression data are non-negative. Therefore, it is necessary to take into 
account the non-negativity of data in modern big data applications. Since data 
representation recovers the latent structure of data, it significantly enhances 
the subsequent processing, and plays an important role in data analysis. 
Conventional data representation approaches, including principal component 
analysis (PCA) and Fishers linear discriminant analysis (FLDA), intrinsically 
recover the mainly effective axes according to the probabilistic distribution 
of data. However, traditional approaches completely omit the non-negativity of 
data.

Non-negative matrix factorization (NMF) decomposes a given non-negative data 
matrix into the product of two lower-rank non-negative factor matrices. The 
learned matrices can be treated as a representation of the data and their 
representing coefficients, respectively. The learned parts-based representation 
has been shown a powerful tool for various practical applications in machine 
learning and data mining because it is consistent with the psychological and 
physical evidence in human brain. Due to its simplicity and high effectiveness, 
NMF has been extended to meet the requirements of various applications in 
computer vision and computational biology. Recently, deep learning [105] has 
been proposed to learn effective representation from large-scale datasets. 
Since the learned representation can recover the nonlinear relationship among 
data points, it has quite high effectiveness, and has been widely applied in 
computer vision and computational biology.

In the rest part, we introduce models and algorithms of non-negative data 
representation in computer vision. To meet the requirements of various 
applications, we proposed advanced NMF models including Logdet divergence based 
sparse NMF (LDS-NMF), robust local coordinate NMF (RLC-NMF), and local 
coordinate graph regularized NMF (LCG-NMF), to handle the rank-deficiency, 
sparsity, robustness, and geometric structure preserving problems in practices. 
LDS-NMF learns stable data representation which constrains the rank of basis by 
incorporating Logdet divergence based regularization. RLC-NMF incorporates the 
maximum correntropy criteria to measure the residual errors and incorporates 
the local coordinate regularization to encourage the sparsity of learned 
coefficients. LCG-NMF further preserve the geometric structure of the data in 
RLC-NMF.

In the second part, we focused on non-negative data representation in 
computational biology. Firstly, we apply the Bi-graph regularized NMF (BIGNMF) 
to predict potential drug-target interactions on four biological datasets, and 
found that non-negative data representation outperforms the state-of-arts 
methods. However, since the similarities between drugs and targets are 
expensive and sometimes deteriorate the performance, we apply the matrix 
completion analysis (MCA) method to drug-target prediction, and con rm the 
intelligibility of this method by validating most of the predicted drug-target 
pairs in the public databases. To take the advantages of both BIGNMF and MCA, 
we apply the weighted NMF (WNMF) method in drug-target prediction and show its 
promises. Secondly, we propose a novel Gauss-Seidel based NMF method (GS-NMF) 
to overcome the imbalance deficiency between features and tumor samples and 
evaluate its effectiveness on several biological datasets of cancer diseases. 
At last, we proposed a multi-task deep learning method (MTDL) to classify 
multiple cancers simultaneously and enhance the classification performance of 
each cancer by leveraging knowledge through shared layers. With the help of 
knowledge transfer, the classification performance of cancer with limited 
samples will be significantly enhanced.


Date:			Wednesday, 24 August 2016

Time:			2:00pm – 4:00pm

Venue:			Room 3494
 			Lifts 25/26

Chairman:		Prof. Jeffrey Chasnov (MATH)

Committee Members:	Prof. Qian Zhang (Supervisor)
 			Prof. Qiong Luo
 			Prof. Long Quan
 			Prof. Shaojie Shen (ECE)
 			Prof. Wei Lou (Comp., PolyU)


**** ALL are Welcome ****