Automatic Localization of Code Omission Faults: a Context Pattern based Approach

PhD Thesis Proposal Defence


Title: "Automatic Localization of Code Omission Faults: a Context Pattern
based Approach"

by

Mr. Xinming Wang


Abstract:

Debugging is a tedious and expensive activity in software maintenance. To ease 
debugging, researchers have proposed various dynamic analysis techniques that 
aid developers in locating the program defects based on passed and failed test 
runs. In this proposal, we study a prevalent type of defects called code 
omission faults, which involve missing code and cannot be ascribed to any 
program entity that exists in the program. Code omission faults pose two major 
challenges to existing dynamic analysis techniques. First, missing code offers 
no execution behavior to observe directly. Second, even when the place of 
omission could have been accurately pinpointed, it is still difficult for 
developers to conjecture the missing code.

To address these challenges of code omission, we conducted an empirical study 
to characterize real-world omission faults. Among 4966 non-trivial bug fixes 
extracted from a large open source project GNU/GCC, we found that 57% of them 
correspond to omission faults, which can be further categorized into 11 
sub-types. Among them, we discovered that some sub-types, such as 
missing-assignment or missing-return, often induce certain dynamic control 
flows and data flows patterns when they trigger program failures. We express 
these patterns in an event flow language and show that by matching them against 
program execution, it is possible to accurately locate these sub-types of 
omission faults and infer part of their missing code. The remaining sub-types 
of omission faults, such as missing-condition and missing-branch, are more 
complex and induce patterns that vary with the missing code, which is mostly 
unknown at the time of debugging. For these complex omission faults, we 
empirically observed that the missing code likely appear elsewhere in a similar 
or even the same form. Inspired by this observation, we propose an approach 
that systematically removes different parts of the program and infers 
control-flow and data-flow patterns that are associated with these pieces of 
artificially removed code. We show that these patterns are useful to the 
localization of complex omission faults. We evaluate our approach using real 
world omission faults and mutation analysis on five real world programs: gcc, 
space, grep, and bc.


Date:  			Wednesday, 12 May 2010

Time:           	10:00am - 12:00noon

Venue:          	Room 4480
 			lifts 25/26

Committee Members:      Dr. Shing-Chi Cheung (Supervisor)
 			Dr. Sunghun Kim (Chairperson)
 			Dr. Lin Gu
 			Dr. Charles Zhang


**** ALL are Welcome ****