Relational Factor Modeling: A New Framework for Statistical Relational Learning

PhD Thesis Proposal Defence


Title: "Relational Factor Modeling: A New Framework for Statistical Relational 
Learning"

by

Mr. Wujun Li


Abstract:

To simplify the modeling procedure, traditional statistical machine learning 
methods always assume that the instances are independent and identically 
distributed (i.i.d.). However, it is not uncommon for some real-world data, 
such as web pages and research papers, to contain relations (links) between the 
instances. Different instances in such data are correlated (linked) with each 
other, which implies that the common i.i.d. assumption is unreasonable for such 
relational data. Hence, naively applying traditional statistical learning 
methods to relational data may lead to misleading conclusion about the data.

Because relational data widely exist in a large variety of application areas, 
such as web mining, social network analysis, bioinformatics, and marketing, 
recently many researchers have started to propose novel methods, called 
statistical relational learning (SRL) methods, to model relational data. The 
existing mainstream SRL models extend traditional graphical models, such as 
Bayesian networks and Markov networks, by eliminating their underlying i.i.d. 
assumption. Some typical examples of such SRL models include relational 
Bayesian networks, relational Markov networks, and Markov logic networks. 
Because the dependency structure in relational data is typically very complex, 
structure learning for these relational graphical models is often very 
time-consuming or even impossible. Hence, it might be impractical to apply 
these models to large-scale relational data sets.

In this proposal, we propose a new SRL framework, called relational factor 
modeling, by extending traditional latent variable modeling and factor analysis 
to relational domains. Based on our framework, a series of novel SRL models, 
called relational factor models (RFMs), are proposed for SRL. One promising 
advantage of our RFMs is that there is no need for the time-consuming structure 
learning at all and the time complexity of most of them is linear to the number 
of observed links in the data. This implies that our RFMs can be used to model 
large-scale data sets. Experimental results show that our models can achieve 
state-of-the-art performance in many real-world applications such as 
linked-document classification.


Date:  			Monday, 26 April 2010

Time:           	4:00pm - 6:00pm

Venue:          	Room 5506
 			lifts 25/26

Committee Members:	Prof. Dit-Yan Yeung (Supervisor)
 			Dr. James Kwok (Chairperson)
 			Dr. Lei Chen
 			Dr. Raymond Wong


**** ALL are Welcome ****