More about HKUST
Relational Factor Modeling: A New Framework for Statistical Relational Learning
PhD Thesis Proposal Defence Title: "Relational Factor Modeling: A New Framework for Statistical Relational Learning" by Mr. Wujun Li Abstract: To simplify the modeling procedure, traditional statistical machine learning methods always assume that the instances are independent and identically distributed (i.i.d.). However, it is not uncommon for some real-world data, such as web pages and research papers, to contain relations (links) between the instances. Different instances in such data are correlated (linked) with each other, which implies that the common i.i.d. assumption is unreasonable for such relational data. Hence, naively applying traditional statistical learning methods to relational data may lead to misleading conclusion about the data. Because relational data widely exist in a large variety of application areas, such as web mining, social network analysis, bioinformatics, and marketing, recently many researchers have started to propose novel methods, called statistical relational learning (SRL) methods, to model relational data. The existing mainstream SRL models extend traditional graphical models, such as Bayesian networks and Markov networks, by eliminating their underlying i.i.d. assumption. Some typical examples of such SRL models include relational Bayesian networks, relational Markov networks, and Markov logic networks. Because the dependency structure in relational data is typically very complex, structure learning for these relational graphical models is often very time-consuming or even impossible. Hence, it might be impractical to apply these models to large-scale relational data sets. In this proposal, we propose a new SRL framework, called relational factor modeling, by extending traditional latent variable modeling and factor analysis to relational domains. Based on our framework, a series of novel SRL models, called relational factor models (RFMs), are proposed for SRL. One promising advantage of our RFMs is that there is no need for the time-consuming structure learning at all and the time complexity of most of them is linear to the number of observed links in the data. This implies that our RFMs can be used to model large-scale data sets. Experimental results show that our models can achieve state-of-the-art performance in many real-world applications such as linked-document classification. Date: Monday, 26 April 2010 Time: 4:00pm - 6:00pm Venue: Room 5506 lifts 25/26 Committee Members: Prof. Dit-Yan Yeung (Supervisor) Dr. James Kwok (Chairperson) Dr. Lei Chen Dr. Raymond Wong **** ALL are Welcome ****