Latent Tree Models: An Application and an Extension

PhD Thesis Proposal Defence


Title: "Latent Tree Models: An Application and an Extension"

by

Mr. Kin-Man Poon


ABSTRACT:

Latent tree models are a class of probabilistic graphical models.  These models 
have a tree structure, in which the internal nodes represent latent variables 
whereas the leaf nodes represent manifest variables.  They allow fast inference 
and have been used for multidimensional clustering, latent structure discovery, 
density estimation, and classification.

This thesis makes two contributions to the research of latent tree models. The 
first contribution is an new application of latent tree models in spectral 
clustering.  In spectral clustering, one defines a similarity matrix for a 
collection of data points, transforms the matrix to get the so-called Laplacian 
matrix, finds the eigenvectors of the Laplacian matrix, and obtains a partition 
of the data points using the leading eigenvectors.  The last step is sometimes 
referred to as rounding, where one needs to decide how many leading 
eigenvectors to use, determine the number of clusters, and partition the data 
points.

We propose to use latent tree models for rounding.  The method differs from 
previous rounding methods in three ways. First, we relax the assumption that 
the number of clusters equals the number of eigenvectors used. Second, when 
deciding how many leading eigenvectors to use, we not only rely on information 
contained in the leading eigenvectors themselves, but also make use of the 
subsequent eigenvectors.  Third, our method is model-based and solves all the 
three subproblems of rounding using latent tree models.  We evaluate our method 
on both synthetic and real-world data.  The results show that our method works 
correctly in the ideal case where between-clusters similarity is 0, and 
degrades gracefully as one moves away from the ideal case.

The second contribution is an extension to latent tree models.  While latent 
tree models have been shown to be useful in data analysis, they contain only 
discrete variables and are thus limited to discrete data. One has to resort to 
discretization if analysis on continuous data is needed.  However, this leads 
to loss of information and may make the resulting models harder to interpret.

We propose an extended class of models, called pouch latent tree models, that 
allow the leaf nodes to represent continuous variables.  This extension allows 
the models to work on continuous data directly.  The models also generalize 
Gaussian mixture models, which are commonly used for model-based clustering on 
continuous data. We use pouch latent tree models for facilitating variable 
selection in clustering, and demonstrate on some benchmark data that it is more 
appropriate to facilitate variable selection than to perform variable selection 
traditionally.  We further demonstrate the usefulness of the models by 
performing multidimensional clustering on some real-world basketball data.  Our 
results exhibit multiple meaningful clusterings and interesting relationships 
between the clusterings.


Date:                   Monday, 4 June 2012

Time:                   2:00pm - 4:00pm

Venue:                  Room 3315
                         lifts 17/18

Committee Members:      Prof. Nevin Zhang (Supervisor)
                         Prof. James Kwok (Chairperson)
 			Dr. Brian Mak
 			Prof. Dit-Yan Yeung


**** ALL are Welcome ****