Topic Modeling of Chinese Language

======================================================================
                        Joint Seminar
======================================================================
The Hong Kong University of Science & Technology
Department of Computer Science and Engineering
Human Language Technology Center
----------------------------------------------------------------------
Speaker:	Dr. Zengchang QIN
		Beihang University

Title:		"Topic Modeling of Chinese Language"

Date:		Friday, 2 September 2011

Time:		11:00am - 12 noon

Venue:		Room 3311 (via lifts 17/18), HKUST

Abstract:

Topic models are hierarchical Bayesian models for language modelling and
document analysis. It has been well-used and achieved a lot of success in
modeling English documents. However, unlike English and the majority of
alphabetic languages, the basic structural unit of Chinese language is
character instead of word, and Chinese words are written without spaces
between them. Most previous research of using topic models for Chinese
documents did not take the Chinese character-word relation into
consideration and simply take the Chinese word as the basic term of
documents. In this talk, we will discuss a novel model to consider the
character-word relation in topic modeling by placing an asymmetric prior
on the topic-word distribution of the standard Latent Dirichlet Allocation
(LDA) model. Compared to LDA, this model can improve performance in
document classification especially when test data contains considerable
number of Chinese words not appeared in training data.


********************
Biography:

Dr. Zengchang Qin is an associate professor in Beihang University,
Beijing, China. Zengchang obtained his MSc and PhD from University of
Bristol, UK, and did his postdoc research with Lotfi Zadeh in UC Berkeley,
US. He used to work (or intern) in HP, BT, Optimor Labs and worked as a
visiting scholar in University of Oxford and Carnegie Mellon University.
His research interests are agent-based modeling, machine learning,
computational intelligence and multimedia retrieval.