Scalable Symbolic Analysis of Large Real World Programs Using API Abstraction and Execution Histories

PhD Thesis Proposal Defence


Title: "Scalable Symbolic Analysis of Large Real World Programs Using API 
Abstraction and Execution Histories"

by

Mr. Yueqi LI


Abstract:

This thesis deals with two important problems of symbolic analysis. The 
first one is path explosion problem. Path explosion is a major issue in 
applying path-sensitive symbolic analysis to large programs. Since real 
world programs are typically built on top of many library functions, it is 
generally impractical to model all APIs and symbolic analysis often 
approximates some APIs' behavior. We observe that many symbolic states 
generated by such symbolic analysis of a procedure are indistinguishable 
to its callers. It is, therefore, possible to keep only one state from 
each set of equivalent symbolic states without affecting the analysis 
result. Based on this observation, we propose an equivalence relation 
called z-equivalence, which is weaker than logical equivalence to relate a 
large number of z-equivalent states. We prove that z-equivalence is strong 
enough to guarantee that paths to be traversed by the symbolic analysis of 
two z-equivalent states are identical, giving the same solutions to 
satisfiability and validity queries. We propose a sound linear algorithm 
to detect z-equivalence. Our experiments show that the symbolic analysis 
that leverages z-equivalence is able to achieve more than ten orders of 
magnitude reduction in terms of search space. The reduction significantly 
alleviates the path explosion problem, enabling us to apply symbolic 
analysis in large programs such as Hadoop and Linux Kernel.

The second problem is precision issue. API approximation can induce many 
unreachable symbolic states, which are expensive to validate manually. In 
the second part of this thesis, we propose a static approach to 
automatically validating the reported anomalous symbolic states. The 
validation makes use of the available runtime data of the un-modeled APIs 
collected from previous program executions. We show that the symbolic 
state validation problem can be cast as a MAX-SAT problem and solved by 
existing constraint solvers. Our technique found 80 unreported bugs when 
it was applied to 10 popular programs with a total of 1.5 million lines of 
code. All of them can be confirmed by test cases. Our technique presents a 
promising way to apply the big data paradigm to software engineering. It 
provides a mechanism to validate the symbolic states of a project by 
leveraging the many concrete input-output values of APIs collected from 
other projects.


Date:			Wednesday, 29 October 2014

Time:                   10:30am - 12:30pm

Venue:                  Room 4472
                        lifts 25/26

Committee Members:	Prof Shing-Chi Cheung (Supervisor)
 			Dr. Charles Zhang (Chairperson)
 			Dr. Sunghun Kim
 			Prof. Fangzhen Lin

**** ALL are Welcome ****