More about HKUST
Scalable Typestate Verification for Millions of Lines of Code
PhD Thesis Proposal Defence
Title: "Scalable Typestate Verification for Millions of Lines of Code"
by
Mr. Gang FAN
Abstract:
Applying static code analysis in the industry involves both technical and
management problems. In this thesis, we address new challenges based on
our experience of applying static code analysis in several large software
companies and propose solutions for some of them. When the code base grows
large, both the precision and scalability of static analysis cannot be
compromised. We first present SMOKE, which is designed for optimizing the
performance for analyzing typestate problems (e.g. memory leak detection),
that uses the staged design together with sparse analysis techniques.
Instead of using a uniform precise analysis for all program paths, in the
first stage, we use a scalable but imprecise analysis to compute a
succinct set of candidate error paths. In the second stage, we leverage a
more precise analysis to verify the feasibility of the candidates. Our
first stage analysis is scalable, due to the design of a new sparse
program representation, namely the use-flow graph (UFG), which enables us
to model the problem as a polynomial-time state analysis. Our second stage
analysis is precise and still efficient, due to the smaller number of
candidates and the design of a dedicated constraint solver. Experimental
results demonstrated that SMOKE can finish checking industrial-sized
projects, up to 8MLoC, in forty minutes with an average false positive
rate of 24.4Despite the technical problems, for continuous using and
improving static analysis in practice, we also need to solve non-technical
problems such as the dilemma between the need for code to improve static
analysis tools and the confidentiality requirements of tool users. We
abstract an optimization cycle from our experiences, which is more
acceptable for both parties. This cycle addresses the trust problem by
introducing an intermediate document containing only code segments that
are sufficient for improving static analysis but cannot reveal sensitive
information such as the whole program source code.
Date: Friday, 3 May 2019
Time: 10:30am - 12:30pm
Venue: Room 4475
lifts 25/26
Committee Members: Dr. Charles Zhang (Supervisor)
Dr. Tao Wang (Chairperson)
Prof. Shing-Chi Cheung
Dr. Qiong Luo
**** ALL are Welcome ****