More about HKUST
Clustering Effectiveness and Efficiency of Community-based Web Search
The Hong Kong University of Science and Technology Department of Computer Science and Engineering Title: "Clustering Effectiveness and Efficiency of Community-based Web Search" by Mr. XU Teng Abstract Many collaborative web search methods are based on user profiling and clustering on user communities. Among the numerous graph models, the Community Clickthrough Model (CCM) captures users' conceptual preference and thus outperforms other content-ignorant models. The corresponding clustering algorithm on CCM, Community-based Agglomerative Divisive Clustering (CADC) algorithm allows incremental clustering of the clickthrough data. In this thesis, we discover the limitations of CCM and CADC and develop enhancements to alleviate the limitations. By adding the page miniature, URL type, to form a quad-partite graph, we propose the Enhanced Community Clickthrough Model (E-CCM) that can capture the users' interests more precisely compared to CCM. We also develop a refinement on CADC, called CADCInc, which supports incremental update and hence maintain a high efficiency even when the volume of data is very large. Experiments show that our proposed E-CCM model has a significant effectiveness gain compared to the original model, and that our CADCInc algorithm can maintain a bounded processing time for large amount of clickthrough data with only a very small tradeoff in clustering quality. Date : 13 May 2011 (Friday) Time : 1:30pm to 2:10pm Venue : 3402 (17-18 lift) Advisor : Prof. Dik Lee 2nd reader : Dr. Wilfred Ng