More about HKUST
Artificial Neural Network in Topic modeling and Language Modeling
PhD Qualifying Examination
Title: "Artificial Neural Network in Topic modeling and Language Modeling"
by
Mr. Wei LI
Abstract:
In recent years, artificial neural network model family has seen
increasing popularity in natural language processing. In particular, with
the development of deep neural network, many new models have been invented
for topic modeling and language modeling. The advantage of the neural
network models is that it can learn abstract representation of input
features in the intermediate layer. It can represent relationship of
various inputs in continuous space. These abstract patterns in the hidden
layer not only contain the higher level linguistic information, but also
reduce the dimensionality of the input features. The relationship
information in the representation can also help reduce data sparsity. Thus
multiple studies in natural language processing find the neural network
model family achieves better performance comparing to traditional models
with one-hot binary input.
Topic modeling and language modeling have close relationship both in
theory and in practice. Generative topic models such as the latent
Dirichlet allocation (LDA) describe the document-topic-word arrangement
using latent variables. They can be seen as bag-of-words language models
which neglect the word order. However, it is very likely that the word
order information can be useful to topic modeling or text categorization
tasks. It may represent the pattern of phrases or certain combination of
the words, which appears regularly in particular topic settings. On the
other hand, topic information can also contribute to language modeling.
The simplest case is to add a topic vector as an extra feature to a neural
network language model, so as to adapt the language model towards specific
topics.
This article provides a survey over important methods of topic modeling
and language modeling, putting emphasis on models related to neural
networks. Moreover, it also includes a section on the distributed
representation of words, which is also a related research with language
modeling and topic modeling.
Date: Thursday, 18 February 2016
Time: 2:30pm - 4:30pm
Venue: Room 4504
Lifts 25/26
Committee Members: Dr. Brian Mak (Supervisor)
Prof. Nevin Zhang (Chairperson)
Dr. Raymond Wong
Prof. Dit-Yan Yeung
**** ALL are Welcome ****