More about HKUST
Effective Topic Detection over Social Media
PhD Thesis Proposal Defence Title: "Effective Topic Detection over Social Media" by Mr. Konstantinos GIANNAKOPOULOS Abstract: Nowadays, Social Networks (SNs) like Facebook and Twitter are very popular. Thousands of users post tweets every day. In this proposal, we are dealing with two common issues of processing tweets. Firstly, we filter out the most significant messages of a corpus of tweets, so that we can clear our dataset from noise and extract information from important only messages. Secondly, we propose a topic detection model that incorporates time and location. Concerning filtering of tweets, we propose a method for classifying tweet messages into two classes: informative and non-informative. We consider informative messages those that contain information that interest the pub- lic, trends, events and news. Non-informative tweets are personal messages that do not interest the public, like conversations between friends, feelings and description of mood. The motivation of our work is keeping informative tweets that contain essential information, and filtering out useless tweets. Real applications that can benefit from our work are trend/topic detection applications, recommendation systems and applications that make predic- tions based on user messages on social media. Challenges of processing tweet messages is that they are short messages, unstructured with unclear topic. We propose a weighted variation of the binary multinomial naive Bayes’ model to identify informative messages. We train our classifier and we evaluate results using 5-fold and 10-fold cross validation. We compare the results with the original binary multinomial naive Bayes’ model. We use two independent datasets of tweet messages crawled from the web. We evaluate and present our results using the following metrics: accuracy, recall, specificity, F-measure with its variations (F2 score and F0.5 score). Concerning topic detection, the existing solutions overlook time and location factors, which are quite important and useful. Moreover, social media are frequently updated. Thus, the proposed detection model should handle the dynamic updates. We introduce a topic model for topic detection that combines time and location. Our model is equipped with incremental estimation of the parameters of the topic model and adaptive window length according to the correlation of consecutive windows and their density. We have conducted extensive experiments to verify the effectiveness and efficiency of our proposed Incremental Adaptive Time Location (IncrAdapTL) model. Date: Thursday, 4 October 2018 Time: 9:00am - 11:00am Venue: Room 5562 (lifts 27/28) Committee Members: Prof. Lei Chen (Supervisor) Dr. Xiaojuan Ma (Chairperson) Dr. Qiong Luo Dr. Wei Wang **** ALL are Welcome ****