More about HKUST
Effective Topic Detection over Social Media
PhD Thesis Proposal Defence
Title: "Effective Topic Detection over Social Media"
by
Mr. Konstantinos GIANNAKOPOULOS
Abstract:
Nowadays, Social Networks (SNs) like Facebook and Twitter are very popular.
Thousands of users post tweets every day. In this proposal, we are dealing with
two common issues of processing tweets. Firstly, we filter out the most
significant messages of a corpus of tweets, so that we can clear our dataset
from noise and extract information from important only messages. Secondly, we
propose a topic detection model that incorporates time and location.
Concerning filtering of tweets, we propose a method for classifying tweet
messages into two classes: informative and non-informative. We consider
informative messages those that contain information that interest the pub- lic,
trends, events and news. Non-informative tweets are personal messages that do
not interest the public, like conversations between friends, feelings and
description of mood. The motivation of our work is keeping informative tweets
that contain essential information, and filtering out useless tweets. Real
applications that can benefit from our work are trend/topic detection
applications, recommendation systems and applications that make predic- tions
based on user messages on social media.
Challenges of processing tweet messages is that they are short messages,
unstructured with unclear topic. We propose a weighted variation of the binary
multinomial naive Bayes’ model to identify informative messages. We train our
classifier and we evaluate results using 5-fold and 10-fold cross validation.
We compare the results with the original binary multinomial naive Bayes’ model.
We use two independent datasets of tweet messages crawled from the web. We
evaluate and present our results using the following metrics: accuracy, recall,
specificity, F-measure with its variations (F2 score and F0.5 score).
Concerning topic detection, the existing solutions overlook time and location
factors, which are quite important and useful. Moreover, social media are
frequently updated. Thus, the proposed detection model should handle the
dynamic updates. We introduce a topic model for topic detection that combines
time and location. Our model is equipped with incremental estimation of the
parameters of the topic model and adaptive window length according to the
correlation of consecutive windows and their density. We have conducted
extensive experiments to verify the effectiveness and efficiency of our
proposed Incremental Adaptive Time Location (IncrAdapTL) model.
Date: Thursday, 4 October 2018
Time: 9:00am - 11:00am
Venue: Room 5562
(lifts 27/28)
Committee Members: Prof. Lei Chen (Supervisor)
Dr. Xiaojuan Ma (Chairperson)
Dr. Qiong Luo
Dr. Wei Wang
**** ALL are Welcome ****