More about HKUST
Latent Tree Models: An Application and an Extension
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
PhD Thesis Defence
Title: "Latent Tree Models: An Application and an Extension"
By
Mr. Kin-Man Poon
Abstract
Latent tree models are a class of probabilistic graphical models. These
models have a tree structure, in which the internal nodes represent latent
variables whereas the leaf nodes represent manifest variables. They allow
fast inference and have been used for multidimensional clustering, latent
structure discovery, density estimation, and classification.
This thesis makes two contributions to the research of latent tree models.
The first contribution is a new application of latent tree models in
spectral clustering. In spectral clustering, one defines a similarity
matrix for a collection of data points, transforms the matrix to get the
so-called Laplacian matrix, finds the eigenvectors of the Laplacian
matrix, and obtains a partition of the data points using the leading
eigenvectors. The last step is sometimes referred to as rounding, where
one needs to decide how many leading eigenvectors to use, to determine the
number of clusters, and to partition the data points.
We propose to use latent tree models for rounding. The method differs
from previous rounding methods in three ways. First, we relax the
assumption that the number of clusters equals the number of eigenvectors
used. Second, when deciding how many leading eigenvectors to use, we not
only rely on information contained in the leading eigenvectors themselves,
but also make use of the subsequent eigenvectors. Third, our method is
model-based and solves all the three subproblems of rounding using latent
tree models. We evaluate our method on both synthetic and real-world
data. The results show that our method works correctly in the ideal case
where between-clusters similarity is 0, and degrades gracefully as one
moves away from the ideal case.
The second contribution is an extension to latent tree models. While
latent tree models have been shown to be useful in data analysis, they
contain only discrete variables and are thus limited to discrete data.
One has to resort to discretization if analysis on continuous data is
needed. However, this leads to loss of information and may make the
resulting models harder to interpret.
We propose an extended class of models, called pouch latent tree models,
that allow the leaf nodes to represent continuous variables. This
extension allows the models to work on continuous data directly. These
models also generalize Gaussian mixture models, which are commonly used
for model-based clustering on continuous data. We use pouch latent tree
models for facilitating variable selection in clustering, and demonstrate
on some benchmark data that it is more appropriate to facilitate variable
selection than to perform variable selection traditionally. We further
demonstrate the usefulness of the models by performing multidimensional
clustering on some real-world basketball data. Our results exhibit
multiple meaningful clusterings and interesting relationships between the
clusterings.
Date: Wednesday, 1 August 2012
Time: 2:00pm – 4:00pm
Venue: Room 3501
Lifts 25/26
Chairman: Prof. Andrew Poon (ECE)
Committee Members: Prof. Nevin Zhang (Supervisor)
Prof. Brian Mak
Prof. Dit-Yan Yeung
Prof. Bingyi Jing (MATH)
Prof. Manfred Jaeger (Aalborg Univ.)
**** ALL are Welcome ****