More about HKUST
Hierarchical Topic Detection in Big Text Data and the Yelp Dataset Challenge
The Hong Kong University of Science and Technology Department of Computer Science and Engineering Final Year Thesis Oral Presentation Title: "Hierarchical Topic Detection in Big Text Data and the Yelp Dataset Challenge" by Mr. Leung Chun Fai Abstract: Hierarchical Topic model has been shown useful in topic detection on academic datasets like NIPS. In this study, we applied Stochastic Progressive EM - Hierarchical Latent Tree Analysis (SPEM-HLTA) on an online customer review dataset, the Yelp Dataset, with a selection of 1.15 million reviews and 4000 word attributes. Based on the SPEM-HLTA topic detection model, we tried to compute the normalized mutual information, or NMI, between topics and the locations of business, and show the topic involvement probability distribution based on the locations of the business. The result shows that the combination of the SPEM-HLTA topic detection model and NMI could be a possible solution in finding the cultural topics and cultural differences in a Yelp Dataset. Date : 30 April 2016 (Saturday) Time : 12:20pm to 1:05pm Venue : Room 5510 (lift 25/26) Advisor : Prof. Nevin ZHANG 2nd Reader : Prof. D.Y. YEUNG