More about HKUST
Improving Information Retrieval in Social Streams
PhD Thesis Proposal Defence
Title: "Improving Information Retrieval in Social Streams"
by
Mr. Jan VOSECKY
Abstract:
In recent years, microblogging services, such as Twitter, emerged as a
popular platform for real-time information exchange among millions of
users. However, the vast amount of content results in an information
overload for users when searching in microblogs. Given the user's search
query, delivering relevant content is a challenging problem. In this
proposal, we therefore present three complementary approaches to tackle
the challenges of information retrieval in microblogs.
First, we propose a method to determine the quality of microblog documents
(called "tweets"). To model the quality of tweets, we devise a new set of
link-based features, in addition to content-based features. We examine the
implicit links between tweets, URLs, hashtags and users, and then propose
novel metrics to reflect quality-based reputation of websites, hashtags
and users. Our evaluation shows that the proposed features outperform a
bag-of-words representation, while requiring less computational time and
storage.
Second, we present two frameworks to model topics in microblog streams.
Topic modeling is an important facility to manage the topical diversity in
microblogs and benefits many applications, such as clustering and ranking.
In our Multi-faceted Topic Modeling framework, we tackle both the short
length of tweets and the rich semantics discussed by microblog users. We
first perform two semantic enrichment steps to inject additional semantics
into the short tweets. The Multi-faceted Topic Model is then proposed to
model latent topics from the social terms in Twitter, auxiliary terms from
external URLs and named entities. In our Geographic Twitter Topic Modeling
framework, we focus on spatial aspects of microblog topics. While early
work has mainly utilized geo-tagged tweets, we propose a content-based
method for extracting location references. The Geographic Twitter Topic
Model is then developed to discover latent topics with two geographic
dimensions, comprising locations at which a topic is discussed and
locations mentioned within the topic.
Third, we present a framework for Collaborative Personalized Twitter
Search. Traditional techniques for personalized Web search are insuffcient
in the microblog domain, because of the diversity of topics, sparseness of
user data and the highly social nature. At the core of our framework, we
develop a collaborative user model, which exploits the user's social
connections in order to obtain a comprehensive account of her preferences.
We then propose a novel user model structure to manage the topical
diversity in Twitter and to enable semantic-aware query disambiguation. A
detailed evaluation has demonstrated a superior ranking performance of our
framework compared with state-of-the-art baselines.
Date: Tuesday, 7 October 2014
Time: 10:30am - 12:30pm
Venue: Room 3501
lifts 25/26
Committee Members: Dr. Wilfred Ng (Supervisor)
Prof. Nevin Zhang (Chairperson)
Prof. Dik-Lun Lee
Dr. Raymond Wong
**** ALL are Welcome ****