More about HKUST
Automatic Localization of Code Omission Faults
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
PhD Thesis Defence
Title: "Automatic Localization of Code Omission Faults"
By
Mr. Xinming Wang
Abstract
Debugging is a tedious and expensive activity in software maintenance. To
ease debugging, researchers have proposed various dynamic analysis
techniques that aid developers in locating the program defects based on
passed and failed test runs. In this thesis, we study a prevalent type of
defects called code omission faults, which involve missing code and cannot
be ascribed to any program entity that exists in the program. Code
omission faults pose two major challenges to existing dynamic analysis
techniques. First, missing code offers no execution behavior to observe
directly. Second, even when the place of omission could have been
accurately pinpointed, it is still difficult for developers to conjecture
the missing code.
To address these challenges of code omission, we conducted an empirical
study to characterize real-world omission faults. Among non-trivial bug
fixes extracted from large open source projects GNU/GCC and MYSQL, we
found that over half of them correspond to omission faults, which can be
further categorized into 11 sub-types. Among them, we made a novel
discovery that some sub-types, such as missing-assignment or
missing-return, often induce certain dynamic control flows and data flows
patterns when they trigger program failures. We express these patterns in
an event flow language and show that by matching them against program
execution, it is possible to accurately locate these sub-types of omission
faults and infer part of the missing code.
The remaining sub-types of omission faults, such as missing-if-condition
and missing-branch, are more complex and induce patterns that vary with
the missing code, which is mostly unknown at the time of debugging. For
these complex omission faults, we empirically observed that the missing
code likely appear elsewhere in a similar or even the same form. Inspired
by this observation, we propose an approach called learning-to-debug. The
basic idea is to systematically remove different parts of the program and
learn the control-/data- flow patterns that are associated with these
pieces of artificially removed code. The inferred patterns are then
applied on the original program to locate the omission faults. We have
designed a novel pattern representation and a learning algorithm in order
to make learning-to-debug scalable.
To evaluate our approaches, we used both real world and seeded omission
faults in four benchmark programs: space, grep, bc, and gcc. Promising
results are reported in the experiments.
Date: Tuesday, 24 August 2010
Time: 9:00am – 11:00am
Venue: Room 3501
Lifts 25/26
Chairman: Prof. Kai Tang (MECH)
Committee Members: Prof. Shing-Chi Cheung (Supervisor)
Prof. Lin Gu
Prof. Charles Zhang
Prof. Ajay Joneja (IELM)
Prof. Michael Lyu (Comp. Sci. & Engg., CUHK)
**** ALL are Welcome ****