UNIDIMENSIONAL CLUSTERING USING LATENT TREE MODELS

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "UNIDIMENSIONAL CLUSTERING USING LATENT TREE MODELS"

By

Miss Hua LIU


Abstract

Cluster Analysis is an important research topic in machine learning and 
data mining. The objective is to partition data into meaningful clusters, 
where the number of clusters might be known or unknown. A variety of 
approaches have been proposed for the task. Examples include 
distance/similarity-based algorithms such as K-means, kernel K-means and 
spectral clustering, as well as model-based methods such as Gaussian 
mixture models (GMMs). Most previous clustering algorithms are targeted 
for continuous data, that is, points in the Euclidean space of certain 
dimension. In this thesis, we focus on discrete data where each attribute 
takes a nite number of possible values.

A commonly used method for clustering discrete data is latent class 
analysis (LCA). LCA is based on a class of nite mixture models known as 
latent class models (LCMs). An LCM consists of a latent variable and a 
number of attributes. It assumes that the attributes are mutually 
independent given the latent variable. Known as local independence, this 
assumption is often too strong in practice, and might lead to spurious 
clusters. Another method for clustering discrete data is called latent 
tree analysis (LTA) and it is based latent tree models (LTMs). LTMs allow 
multiple latent variables and hence the local independence is relaxed. 
Each latent variable represents a soft partition of data. As such, LTA 
yields multiple partitions of data and is multidimensional clustering 
method. While multiple partitions are desirable in some applications, 
there are many situations where one wants only one partition of data.

In this thesis, we develop and investigate a novel method for clustering 
discrete data that yields a single partition of data and does not make the 
local independence assumption. The method learns, from data, a three-level 
LTM where (1) the attributes are at the bottom, (2) there is a unique 
latent variable, the root, at the top level, and (3) there are multiple 
latent variables at the middle level that connect the attributes and the 
root. The root represents the data partition to be obtained. The local 
independence assumption is relaxed by the multiple latent variables at the 
middle level. The method is called UC-LTM, which stands for unidimensional 
clustering based on latent tree models.

In addition to the general-purpose algorithm UC-LTM, we also propose and 
investigate a special-purpose unidimensional clustering method for solving 
the rounding problem of spectral clustering. Spectral clustering starts 
with a similarity matrix for a collection of data points, transforms the 
matrix to get the so-called Laplacian matrix, nds the eigenvectors of the 
Laplacian matrix, and obtains a partition of the data points using the 
leading eigenvectors. The last step is called rounding and it consists of 
three subproblems: (1) decide how many leading eigenvectors to use, (2) 
determine the number of clusters, and (3) partition the data points. To 
solve the rounding problem, we treat the eigenvectors as features, 
discretize them, and then learn a three-level LTM to cluster the ers from 
previous rounding methods in three ways. First, we relax the assumption 
that the number of clusters equals the number of eigenvectors used. 
Second, when deciding how many leading eigenvectors to use, we not only 
rely on information contained in the leading eigenvectors themselves, but 
also make use of the subsequent eigenvectors. Third, our method is 
model-based and solves all the three sub-problems simultaneously.


Date:			Monday, 24 August 2015

Time:			2:00pm - 4:00pm

Venue:			Room 2132C
 			Lift 19

Chairman:		Prof. Shaohui Zheng (ISOM)

Committee Members:	Prof. Nevin Zhang (Supervisor)
 			Prof. Lei Chen
 			Prof. Fangzhen Lin
 			Prof. Raymond Wong (SOSC)
 			Prof. Changhe Yuan (Comp. Sci., City U. of NY)


**** ALL are Welcome ****