Improving Information Retrieval in Social Streams

PhD Thesis Proposal Defence


Title: "Improving Information Retrieval in Social Streams"

by

Mr. Jan VOSECKY


Abstract:

In recent years, microblogging services, such as Twitter, emerged as a 
popular platform for real-time information exchange among millions of 
users. However, the vast amount of content results in an information 
overload for users when searching in microblogs. Given the user's search 
query, delivering relevant content is a challenging problem. In this 
proposal, we therefore present three complementary approaches to tackle 
the challenges of information retrieval in microblogs.

First, we propose a method to determine the quality of microblog documents 
(called "tweets"). To model the quality of tweets, we devise a new set of 
link-based features, in addition to content-based features. We examine the 
implicit links between tweets, URLs, hashtags and users, and then propose 
novel metrics to reflect quality-based reputation of websites, hashtags 
and users. Our evaluation shows that the proposed features outperform a 
bag-of-words representation, while requiring less computational time and 
storage.

Second, we present two frameworks to model topics in microblog streams. 
Topic modeling is an important facility to manage the topical diversity in 
microblogs and benefits many applications, such as clustering and ranking. 
In our Multi-faceted Topic Modeling framework, we tackle both the short 
length of tweets and the rich semantics discussed by microblog users. We 
first perform two semantic enrichment steps to inject additional semantics 
into the short tweets. The Multi-faceted Topic Model is then proposed to 
model latent topics from the social terms in Twitter, auxiliary terms from 
external URLs and named entities. In our Geographic Twitter Topic Modeling 
framework, we focus on spatial aspects of microblog topics. While early 
work has mainly utilized geo-tagged tweets, we propose a content-based 
method for extracting location references. The Geographic Twitter Topic 
Model is then developed to discover latent topics with two geographic 
dimensions, comprising locations at which a topic is discussed and 
locations mentioned within the topic.

Third, we present a framework for Collaborative Personalized Twitter 
Search. Traditional techniques for personalized Web search are insuffcient 
in the microblog domain, because of the diversity of topics, sparseness of 
user data and the highly social nature. At the core of our framework, we 
develop a collaborative user model, which exploits the user's social 
connections in order to obtain a comprehensive account of her preferences. 
We then propose a novel user model structure to manage the topical 
diversity in Twitter and to enable semantic-aware query disambiguation. A 
detailed evaluation has demonstrated a superior ranking performance of our 
framework compared with state-of-the-art baselines.


Date:			Tuesday, 7 October 2014

Time:                   10:30am - 12:30pm

Venue:                  Room 3501
                         lifts 25/26

Committee Members:	Dr. Wilfred Ng (Supervisor)
 			Prof. Nevin Zhang (Chairperson)
 			Prof. Dik-Lun Lee
 			Dr. Raymond Wong

**** ALL are Welcome ****