More about HKUST
Improving Information Retrieval in Social Streams
PhD Thesis Proposal Defence Title: "Improving Information Retrieval in Social Streams" by Mr. Jan VOSECKY Abstract: In recent years, microblogging services, such as Twitter, emerged as a popular platform for real-time information exchange among millions of users. However, the vast amount of content results in an information overload for users when searching in microblogs. Given the user's search query, delivering relevant content is a challenging problem. In this proposal, we therefore present three complementary approaches to tackle the challenges of information retrieval in microblogs. First, we propose a method to determine the quality of microblog documents (called "tweets"). To model the quality of tweets, we devise a new set of link-based features, in addition to content-based features. We examine the implicit links between tweets, URLs, hashtags and users, and then propose novel metrics to reflect quality-based reputation of websites, hashtags and users. Our evaluation shows that the proposed features outperform a bag-of-words representation, while requiring less computational time and storage. Second, we present two frameworks to model topics in microblog streams. Topic modeling is an important facility to manage the topical diversity in microblogs and benefits many applications, such as clustering and ranking. In our Multi-faceted Topic Modeling framework, we tackle both the short length of tweets and the rich semantics discussed by microblog users. We first perform two semantic enrichment steps to inject additional semantics into the short tweets. The Multi-faceted Topic Model is then proposed to model latent topics from the social terms in Twitter, auxiliary terms from external URLs and named entities. In our Geographic Twitter Topic Modeling framework, we focus on spatial aspects of microblog topics. While early work has mainly utilized geo-tagged tweets, we propose a content-based method for extracting location references. The Geographic Twitter Topic Model is then developed to discover latent topics with two geographic dimensions, comprising locations at which a topic is discussed and locations mentioned within the topic. Third, we present a framework for Collaborative Personalized Twitter Search. Traditional techniques for personalized Web search are insuffcient in the microblog domain, because of the diversity of topics, sparseness of user data and the highly social nature. At the core of our framework, we develop a collaborative user model, which exploits the user's social connections in order to obtain a comprehensive account of her preferences. We then propose a novel user model structure to manage the topical diversity in Twitter and to enable semantic-aware query disambiguation. A detailed evaluation has demonstrated a superior ranking performance of our framework compared with state-of-the-art baselines. Date: Tuesday, 7 October 2014 Time: 10:30am - 12:30pm Venue: Room 3501 lifts 25/26 Committee Members: Dr. Wilfred Ng (Supervisor) Prof. Nevin Zhang (Chairperson) Prof. Dik-Lun Lee Dr. Raymond Wong **** ALL are Welcome ****