Hierarchical Topic Detection in Big Text Data and the Yelp Dataset Challenge

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering

Final Year Thesis Oral Presentation

Title: "Hierarchical Topic Detection in Big Text Data and the Yelp Dataset
        Challenge"

by

Mr. Leung Chun Fai


Abstract: 

Hierarchical Topic model has been shown useful in topic detection on 
academic datasets like NIPS. In this study, we applied Stochastic 
Progressive EM - Hierarchical Latent Tree Analysis (SPEM-HLTA) on an 
online customer review dataset, the Yelp Dataset, with a selection of 1.15 
million reviews and 4000 word attributes. Based on the SPEM-HLTA topic 
detection model, we tried to compute the normalized mutual information, or 
NMI, between topics and the locations of business, and show the topic 
involvement probability distribution based on the locations of the 
business. The result shows that the combination of the SPEM-HLTA topic 
detection model and NMI could be a possible solution in finding the 
cultural topics and cultural differences in a Yelp Dataset.


Date                 : 30 April 2016 (Saturday)

Time                 : 12:20pm to 1:05pm

Venue                : Room 5510 (lift 25/26)

Advisor              : Prof. Nevin ZHANG

2nd Reader           : Prof. D.Y. YEUNG