More about HKUST
Effective Topic Detection over Social Media
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Effective Topic Detection over Social Media" By Mr. Konstantinos GIANNAKOPOULOS Abstract Nowadays, Social Networks (SNs) like Facebook and Twitter are very popular. Thousands of users post tweets every day. In this dissertation, we are dealing with three common issues of processing tweets. Firstly, we filter out the most significant messages of a corpus of tweets, so that we can clear our dataset from noise and extract information from important only messages. Secondly, we propose a topic detection model that incorporates time and location. Thirdly, we propose a novel tweet recommendation framework that is simple and stable. Concerning filtering of tweets, we propose a method for classifying tweet messages into two classes: informative and non-informative. We consider informative messages those that contain information that interest the public, trends, events and news. Non-informative tweets are personal messages that do not interest the public, like conversations between friends, feelings and description of mood. The motivation of our work is keeping informative tweets that contain essential information, and filtering out useless tweets. Real applications that can benefit from our work are trend/topic detection applications, recommendation systems and applications that make predictions based on user messages on social media. Challenges of processing tweet messages is that they are short messages, unstructured with unclear topic. We propose a weighted variation of the binary multinomial naive Bayes’ model to identify informative messages. We train our classifier and we evaluate results using 5-fold and 10-fold cross validation. We compare the results with the original binary multinomial naive Bayes’ model. We use two independent datasets of tweet messages crawled from the web. We evaluate and present our results using the following metrics: accuracy, recall, specificity, F-measure with its variations (F2 score and F0.5 score). Concerning topic detection, the existing solutions overlook time and location factors, which are quite important and useful. Moreover, social media are frequently updated. Thus, the proposed detection model should handle the dynamic updates. We introduce a topic model for topic detection that combines time and location. Our model is equipped with incremental estimation of the parameters of the topic model and adaptive window length according to the correlation of consecutive windows and their density. We have conducted extensive experiments to verify the effectiveness and efficiency of our proposed Incremental Adaptive Time Location (IncrAdapTL) model. Concerning tweet recommendation, twitter users post messages according to their interests and read tweets of their friends. However, reading tweets in relevant topics from more users may help them to broaden their perspective in their interests. Topics combined with time and location are more useful. For instance, someone during day-time is working downtown at a finance corporation and during night-time lives with family at another district. This user is interested to read, during working hours, tweets relevant with finance or related to downtown, but not tweets related with entertainment. After work, this user is interested in tweets related to family or entertainment and maybe not tweets relevant to nightlife. Our proposed tweet recommendation model consists of three parts: Firstly, we model users’ preferences by using their previously posted tweets, location and time. Secondly, we model tweet documents by proposing topic enchanced document vectors. Thirdly, we train our model and we suggest tweets to users. Our approach offers time efficient update handling without re-training our model, and tackles the sparsity problem of (user,tweet) pairs. We evaluate our model on approximately 1 million real tweets from Hong Kong, and we show that its performance is stable. Date: Thursday, 13 December 2018 Time: 4:00pm - 6:00pm Venue: Room 2131C Lift 19 Chairman: Prof. Chi-Ying Tsui (ECE) Committee Members: Prof. Lei Chen (Supervisor) Prof. Bo Li Prof. Ke Yi Prof. Ping Gao (CBE) Prof. Yunjun Gao (Zhejiang University) **** ALL are Welcome ****