Search-Based Learning of Latent Tree Models

PhD Thesis Proposal Defence


Title: "Search-Based Learning of Latent Tree Models"

by

Mr. Tao CHEN


Abstract:

A latent variable model is a statistical model that relates a set of
observed variables (aka manifest variables) to set of unobserved
variables (aka latent variables). Latent variable models originate
from psychometrics (Spearman, 1904), developed in social science,
statistics, education testing theory, and recently gain interests in
machine learning. Examples of latent variable models include hidden
Markov models (HMMs), latent class models, factor analysis, etc.

In this thesis we study a class of latent variable models known as
latent tree (LT) models. LT models are tree-structured Bayesian
networks where the leaf nodes represent manifest variables while
internal nodes represent latent variables. We investigate the
automatic induction of LT models from data, and the use of LT models
in cluster analysis of categorical data.

Several search-based algorithms for learning LT models have been
developed. However there are important issues that remain poorly
understood. In this thesis we study three such issues, namely
operation granularity, efficient model evaluation and range of model
adjustment. The investigation sheds new light on search-based
learning of LT models and leads to a new algorithm that is
conceptually simpler and more efficient than the state-of-the-art
and yet finds better models.

LT models can be used for latent structure discovery and density
estimation. In this thesis we demonstrate their usefulness in
cluster analysis of categorical data. We compare LT analysis with
latent class (LC) analysis, a previous method for the same task. It
turns out that LT analysis is superior to LC analysis in terms of
both the quantity and the depth of the discoveries that they make.
In particular it automatically searches for multiple meaningful
dimensions to cluster data while LC analysis always clusters data in
one single way.


Date:     		Friday, 5 September 2008

Time:                   1:30p.m.-3:30p.m.

Venue:                  Room 3402
 			lifts 17-18

Committee Members:      Dr. Nevin Zhang (Supervisor)
 			Dr. James Kwok (Chairperson)
                         Prof. Mordecai Golin
 			Prof. Dit-Yan Yeung


**** ALL are Welcome ****