Latent Factor Models for Statistical Relational Learning

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Latent Factor Models for Statistical Relational Learning"

By

Mr. Wujun Li


Abstract

To simplify the modeling procedure, traditional statistical machine 
learning methods always assume that the instances are independent and 
identically distributed (i.i.d.). However, it is not uncommon for some 
real-world data, such as web pages and research papers, to contain 
relationships (links) between the instances. Different instances in such 
data are correlated (linked) with each other, which implies that the 
common i.i.d. assumption is unreasonable for such relational data. Hence, 
naively applying traditional statistical learning methods to relational 
data may lead to misleading conclusion about the data.

Statistical relational learning (SRL), which attempts to perform learning 
and inference in domains with complex relational structure, has become an 
emerging research area because relational data widely exist in a large 
variety of application areas, such as web mining, social network analysis, 
bioinformatics, economics and marketing. The existing mainstream SRL 
models extend traditional graphical models, such as Bayesian networks and 
Markov networks, by eliminating their underlying i.i.d. assumption. Some 
typical examples of such SRL models include relational Bayesian networks, 
relational Markov networks, and Markov logic networks. Because the 
dependency structure in relational data is typically very complex, 
structure learning for these relational graphical models is often very 
time-consuming. Hence, it might be impractical to apply these models to 
large-scale relational data sets.

In this thesis, we propose a series of novel SRL models, called relational 
factor models (RFMs), by extending traditional latent factor models from 
i.i.d. domains to relational domains. These proposed RFMs provide a 
toolbox for different learning settings: some of them are well suitable 
for transductive inference while others can be used for inductive 
inference; some of them are parametric while others are nonparametric; 
some of them can be used to model data with undirected relationships while 
others can be used for data with directed relationships. One promising 
advantage of our RFMs is that there is no need for the time-consuming 
structure learning and the time complexity of most of them is linear to 
the number of observed links in the data. This implies that our RFMs can 
be used to model large-scale data sets. Experimental results show that our 
models can achieve state-of-the-art performance in many real-world 
applications such as linked-document classification and social network 
analysis.


Date:			Friday, 30 July 2010

Time:			3:00pm – 5:00pm

Venue:			Room 3501
 			Lifts 25/26

Chairman:		Prof. Andrew Poon (ECE)

Committee Members:	Prof. Dit-Yan Yeung (Supervisor)
 			Prof. James Kwok
 			Prof. Nevin Zhang
                      	Prof. Weichuan Yu (ECE)
                         Prof. Huan Liu (Comp. Sci. & Engg., Arizona State Univ.)


**** ALL are Welcome ****