Precise and Scalable Static Bug Finding for Industrial-Sized Code

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Precise and Scalable Static Bug Finding for Industrial-Sized Code"

By

Mr. Qingkai SHI


Abstract:

Software bugs cost developers and software companies a great deal of time 
and money. Although previous work has reported many success stories for 
using static bug-finding tools, it is still difficult to find bugs hidden 
behind sophisticated pointer operations, deep calling contexts, and 
complex path conditions with a low false-positive rate, while sieving 
through millions of lines of code in just a few hours. In this thesis, we 
present our novel designs of sparse value-flow analysis to tackle a wide 
range of software bugs caused by improper value flows. The proposed 
approach has been commercialized and deployed in many of the global 500 
companies. It also has reported hundreds of real bugs for open-source 
software systems.

The first problem addressed in the thesis is referred to as the pointer 
trap: a precise points-to analysis limits the scalability of sparse 
value-flow analysis and an imprecise one seriously undermines its 
precision. To solve the problem, we present Pinpoint, a holistic approach 
that decomposes the cost of high-precision points-to analysis by precisely 
discovering local data dependence and delaying the expensive 
inter-procedural analysis through memorization. Such memorization enables 
the on-demand slicing and, thus, improves the scalability with high 
precision. Experiments show that Pinpoint can check millions of lines of 
code in less than five hours with a false positive rate lower than 30%.

The second problem addressed in the thesis is known as the extensional 
scalability problem, which happens when we simultaneously check many 
value-flow properties with high precision. A major factor to this 
deficiency is that the core static analysis engine is oblivious of the 
mutual synergy among the properties being checked, thus inevitably losing 
many optimization opportunities. Our work is to leverage the 
inter-property awareness and to capture redundancies and inconsistencies 
when many properties are considered together. The evaluation results 
demonstrate that the approach, Catapult, is more than 8x faster than 
Pinpoint but consumes only 1/7 of the memory when checking twenty 
value-flow properties together.

The third problem addressed in the thesis is how to improve the 
parallelism of static bug finding over the conventional parallel designs. 
Conventionally, bottom-up program analysis has been easy to parallelize 
because functions without caller-callee relations can be analyzed 
independently. However, functions with caller-callee relations have to be 
analyzed sequentially because the analysis of a function depends on the 
analysis results of its callees. We present Cheetah, a framework of 
bottom-up analysis, in which the analysis task of a function is 
partitioned into multiple sub-tasks. These sub-tasks are pipelined and run 
in parallel, even though the caller-callee relations exist. The evaluation 
results of Cheetah demonstrate significant speedup over a conventional 
parallel design.


Date:			Thursday, 23 April 2020

Time:			10:00am - 12:00noon

Zoom Meeting:           https://hkust.zoom.us/j/894501379

Chairman:		Prof. Qinglu ZENG (OCES)

Committee Members:	Prof. Charles ZHANG (Supervisor)
  			Prof. Shing-Chi CHEUNG
  			Prof. Shuai WANG
  			Prof. Weichuan YU (ECE)
  			Prof. Xiangyu ZHANG (Purdue University)


**** ALL are Welcome ****