More about HKUST
Non-negative Data Representation in Vision and Bioinformatics
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Non-negative Data Representation in Vision and Bioinformatics" By Miss Qing LIAO Abstract Over the past decades, computer vision and computational biology are two hot topics in nowadays big data analysis. Both domains have large amount of non-negative data. For example, in computer vision, the pixel values of images and the frames of videos are both non-negative. In computational biology, the gene expression data are non-negative. Therefore, it is necessary to take into account the non-negativity of data in modern big data applications. Since data representation recovers the latent structure of data, it significantly enhances the subsequent processing, and plays an important role in data analysis. Conventional data representation approaches, including principal component analysis (PCA) and Fishers linear discriminant analysis (FLDA), intrinsically recover the mainly effective axes according to the probabilistic distribution of data. However, traditional approaches completely omit the non-negativity of data. Non-negative matrix factorization (NMF) decomposes a given non-negative data matrix into the product of two lower-rank non-negative factor matrices. The learned matrices can be treated as a representation of the data and their representing coefficients, respectively. The learned parts-based representation has been shown a powerful tool for various practical applications in machine learning and data mining because it is consistent with the psychological and physical evidence in human brain. Due to its simplicity and high effectiveness, NMF has been extended to meet the requirements of various applications in computer vision and computational biology. Recently, deep learning [105] has been proposed to learn effective representation from large-scale datasets. Since the learned representation can recover the nonlinear relationship among data points, it has quite high effectiveness, and has been widely applied in computer vision and computational biology. In the rest part, we introduce models and algorithms of non-negative data representation in computer vision. To meet the requirements of various applications, we proposed advanced NMF models including Logdet divergence based sparse NMF (LDS-NMF), robust local coordinate NMF (RLC-NMF), and local coordinate graph regularized NMF (LCG-NMF), to handle the rank-deficiency, sparsity, robustness, and geometric structure preserving problems in practices. LDS-NMF learns stable data representation which constrains the rank of basis by incorporating Logdet divergence based regularization. RLC-NMF incorporates the maximum correntropy criteria to measure the residual errors and incorporates the local coordinate regularization to encourage the sparsity of learned coefficients. LCG-NMF further preserve the geometric structure of the data in RLC-NMF. In the second part, we focused on non-negative data representation in computational biology. Firstly, we apply the Bi-graph regularized NMF (BIGNMF) to predict potential drug-target interactions on four biological datasets, and found that non-negative data representation outperforms the state-of-arts methods. However, since the similarities between drugs and targets are expensive and sometimes deteriorate the performance, we apply the matrix completion analysis (MCA) method to drug-target prediction, and con rm the intelligibility of this method by validating most of the predicted drug-target pairs in the public databases. To take the advantages of both BIGNMF and MCA, we apply the weighted NMF (WNMF) method in drug-target prediction and show its promises. Secondly, we propose a novel Gauss-Seidel based NMF method (GS-NMF) to overcome the imbalance deficiency between features and tumor samples and evaluate its effectiveness on several biological datasets of cancer diseases. At last, we proposed a multi-task deep learning method (MTDL) to classify multiple cancers simultaneously and enhance the classification performance of each cancer by leveraging knowledge through shared layers. With the help of knowledge transfer, the classification performance of cancer with limited samples will be significantly enhanced. Date: Wednesday, 24 August 2016 Time: 2:00pm – 4:00pm Venue: Room 3494 Lifts 25/26 Chairman: Prof. Jeffrey Chasnov (MATH) Committee Members: Prof. Qian Zhang (Supervisor) Prof. Qiong Luo Prof. Long Quan Prof. Shaojie Shen (ECE) Prof. Wei Lou (Comp., PolyU) **** ALL are Welcome ****