More about HKUST
Precise and Scalable Static Bug Finding for Industrial-Sized Code
PhD Thesis Proposal Defence Title: "Precise and Scalable Static Bug Finding for Industrial-Sized Code" by Mr. Qingkai SHI Abstract: Software bugs cost developers and software companies a great deal of time and money. For instance, the Heartbleed bug discovered in 2014 affected around 500,000 websites and hundreds of products from Cisco and Juniper. To tame such beasts hidden in software, developers from industry usually utilizes static bug-finding tools to check possible bugs before a product is released. Although previous work has reported many success stories for using static bug-finding tools, we still cannot have the cake and eat it. In practice, it is still difficult to find bugs hidden behind sophisticated pointer operations, deep calling contexts, and complex path conditions with low false positive rates, while sieving through millions of lines of code in just a few hours. In this thesis, we present our novel designs of sparse value-flow analysis to tackle a wide range of software bugs caused by improper value flows. The first problem we try to address is referred to as the "pointer trap": a precise points-to analysis limits the scalability of sparse value-flow analysis and an imprecise one seriously undermines its precision. To this end, we present Pinpoint, a holistic approach that decomposes the cost of high-precision points-to analysis by precisely discovering local data dependence and delaying the expensive inter-procedural analysis through memorization. Such memorization enables the on-demand slicing of only the necessary inter-procedural data dependence and path feasibility queries, which are then solved by a costly SMT solver. Experiments show that Pinpoint can check programs such as MySQL (around 2 million lines of code) within 1.5 hours. The overall false positive rate is also very low (14.3% - 23.6%). Pinpoint has discovered over a hundred real bugs in mature and extensively checked open source systems. The second problem addressed in the thesis is how to improve the parallelism of static bug finding over the conventional parallel designs. Conventionally, bottom-up program analysis has been easy to parallelize because functions without caller-callee relations can be analyzed independently. However, such function-level parallelism is significantly limited by the calling dependence -- functions with caller-callee relations have to be analyzed sequentially because the analysis of a function depends on the analysis results, a.k.a., function summaries, of its callees. We observe that the calling dependence can be relaxed in many cases and, as a result, the parallelism can be improved. We present Cheetah, a framework of bottom-up data flow analysis, in which the analysis task of each function is elaborately partitioned into multiple sub-tasks to generate pipelineable function summaries. These sub-tasks are pipelined and run in parallel without any additional synchronization, even though the calling dependence exists. We formalize our idea under the IFDS/IDE framework and discuss its application to sparse value-flow analysis. We evaluate Cheetah on a series of standard benchmark programs and open-source projects, which demonstrates significant speedup over a conventional parallel design. Date: Friday, 18 October 2019 Time: 2:00pm - 4:00pm Venue: Room 5501 lifts 25/26 Committee Members: Dr. Charles Zhang (Supervisor) Prof. Shing-Chi Cheung (Chairperson) Prof. Cunsheng Ding Dr. Shuai Wang **** ALL are Welcome ****