More about HKUST
Scalable Typestate Verification for Millions of Lines of Code
PhD Thesis Proposal Defence Title: "Scalable Typestate Verification for Millions of Lines of Code" by Mr. Gang FAN Abstract: Applying static code analysis in the industry involves both technical and management problems. In this thesis, we address new challenges based on our experience of applying static code analysis in several large software companies and propose solutions for some of them. When the code base grows large, both the precision and scalability of static analysis cannot be compromised. We first present SMOKE, which is designed for optimizing the performance for analyzing typestate problems (e.g. memory leak detection), that uses the staged design together with sparse analysis techniques. Instead of using a uniform precise analysis for all program paths, in the first stage, we use a scalable but imprecise analysis to compute a succinct set of candidate error paths. In the second stage, we leverage a more precise analysis to verify the feasibility of the candidates. Our first stage analysis is scalable, due to the design of a new sparse program representation, namely the use-flow graph (UFG), which enables us to model the problem as a polynomial-time state analysis. Our second stage analysis is precise and still efficient, due to the smaller number of candidates and the design of a dedicated constraint solver. Experimental results demonstrated that SMOKE can finish checking industrial-sized projects, up to 8MLoC, in forty minutes with an average false positive rate of 24.4Despite the technical problems, for continuous using and improving static analysis in practice, we also need to solve non-technical problems such as the dilemma between the need for code to improve static analysis tools and the confidentiality requirements of tool users. We abstract an optimization cycle from our experiences, which is more acceptable for both parties. This cycle addresses the trust problem by introducing an intermediate document containing only code segments that are sufficient for improving static analysis but cannot reveal sensitive information such as the whole program source code. Date: Friday, 3 May 2019 Time: 10:30am - 12:30pm Venue: Room 4475 lifts 25/26 Committee Members: Dr. Charles Zhang (Supervisor) Dr. Tao Wang (Chairperson) Prof. Shing-Chi Cheung Dr. Qiong Luo **** ALL are Welcome ****