Automatic Spreadsheet Cell Clustering and Smell Detection using Strong and Weak Features

MPhil Thesis Defence


Title: "Automatic Spreadsheet Cell Clustering and Smell Detection using 
Strong and Weak Features"

By

Miss Wanjun CHEN


Abstract

Spreadsheets are error-prone. Although various techniques are proposed to 
detect errors in terms of smells, they suffer from two issues. First, they 
cannot uniformly characterize and detect smells. Each technique targets 
some specific smell types, and fails to leverage information derived by 
previous works to improve detection accuracy. Second, smells are often 
detected as violations of pre-defined rules, thus failing to adapt to 
diverse user practices. In this thesis, we propose to derive cell clusters 
automatically using a two-stage technique based on strong and weak 
features that capture different user practices. Smells can then be 
detected as outliers of these clusters in feature space. We implemented 
our technique and applied it to 70 spreadsheet files randomly sampled from 
EUSES Corpus. Experiment results show that our technique is effective to 
cluster cells and capable of detecting multiple types of smells with a 
precision 0.73, recall 0.61, F-measure 0.67 compared with existing work 
0.59, 0.51, 0.55 respectively.


Date:			Thursday, 23 July 2015

Time:			10:00am - 12:00noon

Venue:			Room 2132C
 			Lift 19

Committee Members:	Prof. Shing-Chi Cheung (Supervisor)
 			Dr. Sunghun Kim (Chairperson)
 			Dr. Qiong Luo


**** ALL are Welcome ****