More about HKUST
Artificial Neural Network in Topic modeling and Language Modeling
PhD Qualifying Examination Title: "Artificial Neural Network in Topic modeling and Language Modeling" by Mr. Wei LI Abstract: In recent years, artificial neural network model family has seen increasing popularity in natural language processing. In particular, with the development of deep neural network, many new models have been invented for topic modeling and language modeling. The advantage of the neural network models is that it can learn abstract representation of input features in the intermediate layer. It can represent relationship of various inputs in continuous space. These abstract patterns in the hidden layer not only contain the higher level linguistic information, but also reduce the dimensionality of the input features. The relationship information in the representation can also help reduce data sparsity. Thus multiple studies in natural language processing find the neural network model family achieves better performance comparing to traditional models with one-hot binary input. Topic modeling and language modeling have close relationship both in theory and in practice. Generative topic models such as the latent Dirichlet allocation (LDA) describe the document-topic-word arrangement using latent variables. They can be seen as bag-of-words language models which neglect the word order. However, it is very likely that the word order information can be useful to topic modeling or text categorization tasks. It may represent the pattern of phrases or certain combination of the words, which appears regularly in particular topic settings. On the other hand, topic information can also contribute to language modeling. The simplest case is to add a topic vector as an extra feature to a neural network language model, so as to adapt the language model towards specific topics. This article provides a survey over important methods of topic modeling and language modeling, putting emphasis on models related to neural networks. Moreover, it also includes a section on the distributed representation of words, which is also a related research with language modeling and topic modeling. Date: Thursday, 18 February 2016 Time: 2:30pm - 4:30pm Venue: Room 4504 Lifts 25/26 Committee Members: Dr. Brian Mak (Supervisor) Prof. Nevin Zhang (Chairperson) Dr. Raymond Wong Prof. Dit-Yan Yeung **** ALL are Welcome ****