More about HKUST
Search-Based Learning of Latent Tree Models
PhD Thesis Proposal Defence Title: "Search-Based Learning of Latent Tree Models" by Mr. Tao CHEN Abstract: A latent variable model is a statistical model that relates a set of observed variables (aka manifest variables) to set of unobserved variables (aka latent variables). Latent variable models originate from psychometrics (Spearman, 1904), developed in social science, statistics, education testing theory, and recently gain interests in machine learning. Examples of latent variable models include hidden Markov models (HMMs), latent class models, factor analysis, etc. In this thesis we study a class of latent variable models known as latent tree (LT) models. LT models are tree-structured Bayesian networks where the leaf nodes represent manifest variables while internal nodes represent latent variables. We investigate the automatic induction of LT models from data, and the use of LT models in cluster analysis of categorical data. Several search-based algorithms for learning LT models have been developed. However there are important issues that remain poorly understood. In this thesis we study three such issues, namely operation granularity, efficient model evaluation and range of model adjustment. The investigation sheds new light on search-based learning of LT models and leads to a new algorithm that is conceptually simpler and more efficient than the state-of-the-art and yet finds better models. LT models can be used for latent structure discovery and density estimation. In this thesis we demonstrate their usefulness in cluster analysis of categorical data. We compare LT analysis with latent class (LC) analysis, a previous method for the same task. It turns out that LT analysis is superior to LC analysis in terms of both the quantity and the depth of the discoveries that they make. In particular it automatically searches for multiple meaningful dimensions to cluster data while LC analysis always clusters data in one single way. Date: Friday, 5 September 2008 Time: 1:30p.m.-3:30p.m. Venue: Room 3402 lifts 17-18 Committee Members: Dr. Nevin Zhang (Supervisor) Dr. James Kwok (Chairperson) Prof. Mordecai Golin Prof. Dit-Yan Yeung **** ALL are Welcome ****