More about HKUST
Scalable Symbolic Analysis of Large Real World Programs Using API Abstraction and Execution Histories
PhD Thesis Proposal Defence
Title: "Scalable Symbolic Analysis of Large Real World Programs Using API
Abstraction and Execution Histories"
by
Mr. Yueqi LI
Abstract:
This thesis deals with two important problems of symbolic analysis. The
first one is path explosion problem. Path explosion is a major issue in
applying path-sensitive symbolic analysis to large programs. Since real
world programs are typically built on top of many library functions, it is
generally impractical to model all APIs and symbolic analysis often
approximates some APIs' behavior. We observe that many symbolic states
generated by such symbolic analysis of a procedure are indistinguishable
to its callers. It is, therefore, possible to keep only one state from
each set of equivalent symbolic states without affecting the analysis
result. Based on this observation, we propose an equivalence relation
called z-equivalence, which is weaker than logical equivalence to relate a
large number of z-equivalent states. We prove that z-equivalence is strong
enough to guarantee that paths to be traversed by the symbolic analysis of
two z-equivalent states are identical, giving the same solutions to
satisfiability and validity queries. We propose a sound linear algorithm
to detect z-equivalence. Our experiments show that the symbolic analysis
that leverages z-equivalence is able to achieve more than ten orders of
magnitude reduction in terms of search space. The reduction significantly
alleviates the path explosion problem, enabling us to apply symbolic
analysis in large programs such as Hadoop and Linux Kernel.
The second problem is precision issue. API approximation can induce many
unreachable symbolic states, which are expensive to validate manually. In
the second part of this thesis, we propose a static approach to
automatically validating the reported anomalous symbolic states. The
validation makes use of the available runtime data of the un-modeled APIs
collected from previous program executions. We show that the symbolic
state validation problem can be cast as a MAX-SAT problem and solved by
existing constraint solvers. Our technique found 80 unreported bugs when
it was applied to 10 popular programs with a total of 1.5 million lines of
code. All of them can be confirmed by test cases. Our technique presents a
promising way to apply the big data paradigm to software engineering. It
provides a mechanism to validate the symbolic states of a project by
leveraging the many concrete input-output values of APIs collected from
other projects.
Date: Wednesday, 29 October 2014
Time: 10:30am - 12:30pm
Venue: Room 4472
lifts 25/26
Committee Members: Prof Shing-Chi Cheung (Supervisor)
Dr. Charles Zhang (Chairperson)
Dr. Sunghun Kim
Prof. Fangzhen Lin
**** ALL are Welcome ****