Clustering Effectiveness and Efficiency of Community-based Web Search

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering

Title: "Clustering Effectiveness and Efficiency of Community-based Web 
       Search"

by

Mr. XU Teng

Abstract

Many collaborative web search methods are based on user profiling and 
clustering on user communities. Among the numerous graph models, the 
Community Clickthrough Model (CCM) captures users' conceptual preference 
and thus outperforms other content-ignorant models. The corresponding 
clustering algorithm on CCM, Community-based Agglomerative Divisive 
Clustering (CADC) algorithm allows incremental clustering of the 
clickthrough data. In this thesis, we discover the limitations of CCM and 
CADC and develop enhancements to alleviate the limitations. By adding the 
page miniature, URL type, to form a quad-partite graph, we propose the 
Enhanced Community Clickthrough Model (E-CCM) that can capture the users' 
interests more precisely compared to CCM. We also develop a refinement on 
CADC, called CADCInc, which supports incremental update and hence maintain 
a high efficiency even when the volume of data is very large. Experiments 
show that our proposed E-CCM model has a significant effectiveness gain 
compared to the original model, and that our CADCInc algorithm can 
maintain a bounded processing time for large amount of clickthrough data 
with only a very small tradeoff in clustering quality.

Date            :       13 May 2011 (Friday)

Time            :       1:30pm to 2:10pm

Venue		:       3402 (17-18 lift)

Advisor         :	Prof. Dik Lee

2nd reader      :	Dr. Wilfred Ng