A Survey on Probabilistic Topic Modeling

PhD Qualifying Examination


Title: "A Survey on Probabilistic Topic Modeling"

by

Miss Peixian CHEN


Abstract:

The booming volume of digitized document collections makes it increasingly 
difficult  to find desired information within reasonable time.  A great many 
computational or analysing tools have been developed to meet the challenge. 
Topic modeling is currently the most widely-used one with underlying semantic 
interpretation. It aims at discovering patterns in the use of words that can be 
utilized to organize and summarize documents in a corpus. In this paper, we 
give a survey of probabilistic topic modeling.

In probabilistic topic modeling, topics are modeled as probabilistic 
distributions over a vocabulary. The documents are assumed to have been 
produced from a list of unobserved topics through a probabilistic generative 
process. Statistical inference is performed to invert the generative process 
and identify the topics. In this survey, we first discuss the basic 
probabilistic topic models as well as the associated inference algorithms. Then 
we concentrate on extensions to the basic models that consider the modeling of 
topic correlations, the automatic determination of the number of topics,  and 
the evolution of topics over time. Evaluation methods, along with a comparison 
of above probabilistic topic models, will be also presented.


Date:                   Wednesday, 23 April 2014

Time:                   10:00am - 12:00noon

Venue:                  Room 5563
                         Lifts 27/28

Committee Members:      Prof. Nevin Zhang (Supervisor)
                         Prof. Dit-Yan Yeung (Chairperson)
                         Prof. Dik-Lun Lee
                         Dr. Raymond Wong


**** ALL are Welcome ****