More about HKUST
SOFTWARE DEFECT PREDICTION ON UNLABELED DATASETS
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "SOFTWARE DEFECT PREDICTION ON UNLABELED DATASETS" By Mr. Jaechang NAM Abstract Defect prediction on new projects or projects lacking in historical data is one of interesting problems in defect prediction studies. This is largely because it is difficult to collect bug information to label a dataset for training a prediction model. We call this problem defect prediction on unlabeled datasets. Cross-project defect prediction (CPDP) has tried to solve this problem by reusing prediction models built by other projects that have enough historical data. However, CPDP may not always build a strong prediction model because of the different distributions among datasets. Approaches for defect prediction only using unlabeled datasets have also tried to address the problem by adopting the unsupervised learning technique but have one major limitation, the necessity for manual effort. To address these limitations, we propose three techniques that can build prediction models on unlabeled datasets. First, we propose TCA+ that improves the prediction performance of CPDP by adopting a state-of-the-art transfer learning technique, transfer component analysis (TCA). TCA+ is an extended TCA to suggest the most appropriate normalization technique before applying TCA for CPDP. Second, we propose heterogeneous defect prediction (HDP) that enables cross-project defect prediction on projects with heterogeneous metric sets. HDP generates the same metric set between datasets used in CPDP by matching metrics that have similar distributions. Lastly, we propose CLAMI that enables defect prediction by using only unlabeled datasets to build prediction models. The key idea of the CLAMI approach is to generate a training dataset by using the magnitude of metric values from an unlabeled dataset. Our proposed techniques, TCA+, HDP, and CLAMI, address limitations for defect prediction on unlabeled datasets. However, the three techniques still have challenging issues to be resolved. We also discuss them as future work. Date: Thursday, 23 July 2015 Time: 1:30pm - 3:30pm Venue: Room 4483 Lifts 25/26 Chairman: Prof. Hoi Sing Kwok (ECE) Committee Members: Prof. Sunghun Kim (Supervisor) Prof. Shing Chi Cheung Prof. Raymond Wong Prof. Jing Wang (ISOM) Prof. Martin Pinzger (Univ. of Klagenfurt, Austria) **** ALL are Welcome ****