More about HKUST
Latent Factor Models for Statistical Relational Learning
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Latent Factor Models for Statistical Relational Learning" By Mr. Wujun Li Abstract To simplify the modeling procedure, traditional statistical machine learning methods always assume that the instances are independent and identically distributed (i.i.d.). However, it is not uncommon for some real-world data, such as web pages and research papers, to contain relationships (links) between the instances. Different instances in such data are correlated (linked) with each other, which implies that the common i.i.d. assumption is unreasonable for such relational data. Hence, naively applying traditional statistical learning methods to relational data may lead to misleading conclusion about the data. Statistical relational learning (SRL), which attempts to perform learning and inference in domains with complex relational structure, has become an emerging research area because relational data widely exist in a large variety of application areas, such as web mining, social network analysis, bioinformatics, economics and marketing. The existing mainstream SRL models extend traditional graphical models, such as Bayesian networks and Markov networks, by eliminating their underlying i.i.d. assumption. Some typical examples of such SRL models include relational Bayesian networks, relational Markov networks, and Markov logic networks. Because the dependency structure in relational data is typically very complex, structure learning for these relational graphical models is often very time-consuming. Hence, it might be impractical to apply these models to large-scale relational data sets. In this thesis, we propose a series of novel SRL models, called relational factor models (RFMs), by extending traditional latent factor models from i.i.d. domains to relational domains. These proposed RFMs provide a toolbox for different learning settings: some of them are well suitable for transductive inference while others can be used for inductive inference; some of them are parametric while others are nonparametric; some of them can be used to model data with undirected relationships while others can be used for data with directed relationships. One promising advantage of our RFMs is that there is no need for the time-consuming structure learning and the time complexity of most of them is linear to the number of observed links in the data. This implies that our RFMs can be used to model large-scale data sets. Experimental results show that our models can achieve state-of-the-art performance in many real-world applications such as linked-document classification and social network analysis. Date: Friday, 30 July 2010 Time: 3:00pm – 5:00pm Venue: Room 3501 Lifts 25/26 Chairman: Prof. Andrew Poon (ECE) Committee Members: Prof. Dit-Yan Yeung (Supervisor) Prof. James Kwok Prof. Nevin Zhang Prof. Weichuan Yu (ECE) Prof. Huan Liu (Comp. Sci. & Engg., Arizona State Univ.) **** ALL are Welcome ****