More about HKUST
Hierarchical Topic Detection in Big Text Data and the Yelp Dataset Challenge
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
Final Year Thesis Oral Presentation
Title: "Hierarchical Topic Detection in Big Text Data and the Yelp Dataset
Challenge"
by
Mr. Leung Chun Fai
Abstract:
Hierarchical Topic model has been shown useful in topic detection on
academic datasets like NIPS. In this study, we applied Stochastic
Progressive EM - Hierarchical Latent Tree Analysis (SPEM-HLTA) on an
online customer review dataset, the Yelp Dataset, with a selection of 1.15
million reviews and 4000 word attributes. Based on the SPEM-HLTA topic
detection model, we tried to compute the normalized mutual information, or
NMI, between topics and the locations of business, and show the topic
involvement probability distribution based on the locations of the
business. The result shows that the combination of the SPEM-HLTA topic
detection model and NMI could be a possible solution in finding the
cultural topics and cultural differences in a Yelp Dataset.
Date : 30 April 2016 (Saturday)
Time : 12:20pm to 1:05pm
Venue : Room 5510 (lift 25/26)
Advisor : Prof. Nevin ZHANG
2nd Reader : Prof. D.Y. YEUNG