More about HKUST
Latent Factor Models for Statistical Relational Learning
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
PhD Thesis Defence
Title: "Latent Factor Models for Statistical Relational Learning"
By
Mr. Wujun Li
Abstract
To simplify the modeling procedure, traditional statistical machine
learning methods always assume that the instances are independent and
identically distributed (i.i.d.). However, it is not uncommon for some
real-world data, such as web pages and research papers, to contain
relationships (links) between the instances. Different instances in such
data are correlated (linked) with each other, which implies that the
common i.i.d. assumption is unreasonable for such relational data. Hence,
naively applying traditional statistical learning methods to relational
data may lead to misleading conclusion about the data.
Statistical relational learning (SRL), which attempts to perform learning
and inference in domains with complex relational structure, has become an
emerging research area because relational data widely exist in a large
variety of application areas, such as web mining, social network analysis,
bioinformatics, economics and marketing. The existing mainstream SRL
models extend traditional graphical models, such as Bayesian networks and
Markov networks, by eliminating their underlying i.i.d. assumption. Some
typical examples of such SRL models include relational Bayesian networks,
relational Markov networks, and Markov logic networks. Because the
dependency structure in relational data is typically very complex,
structure learning for these relational graphical models is often very
time-consuming. Hence, it might be impractical to apply these models to
large-scale relational data sets.
In this thesis, we propose a series of novel SRL models, called relational
factor models (RFMs), by extending traditional latent factor models from
i.i.d. domains to relational domains. These proposed RFMs provide a
toolbox for different learning settings: some of them are well suitable
for transductive inference while others can be used for inductive
inference; some of them are parametric while others are nonparametric;
some of them can be used to model data with undirected relationships while
others can be used for data with directed relationships. One promising
advantage of our RFMs is that there is no need for the time-consuming
structure learning and the time complexity of most of them is linear to
the number of observed links in the data. This implies that our RFMs can
be used to model large-scale data sets. Experimental results show that our
models can achieve state-of-the-art performance in many real-world
applications such as linked-document classification and social network
analysis.
Date: Friday, 30 July 2010
Time: 3:00pm – 5:00pm
Venue: Room 3501
Lifts 25/26
Chairman: Prof. Andrew Poon (ECE)
Committee Members: Prof. Dit-Yan Yeung (Supervisor)
Prof. James Kwok
Prof. Nevin Zhang
Prof. Weichuan Yu (ECE)
Prof. Huan Liu (Comp. Sci. & Engg., Arizona State Univ.)
**** ALL are Welcome ****