More about HKUST
Practical Symbolic Static Analysis For Effective Bug Detection
PhD Thesis Proposal Defence
Title: "Practical Symbolic Static Analysis For Effective Bug Detection"
by
Mr. Yiyuan GUO
Abstract:
Static analysis infers the behaviors of programs without executing them and
is widely used to detect bugs automatically. Many static analysis techniques
employ symbolic methods where symbolic variables are introduced for unknown
values and the program state is expressed in symbolic formulas. However, we
encounter great challenges in making the symbolic method effective for bug
detection in realistic, million-line codebases. First, when targeting
real-world software, the inherent limitation of static analysis often leads
to the introduction of many under-constrained symbolic variables, causing
significant imprecision. Second, the scalability of the symbolic method is
greatly affected by its memory model, especially when the checked property
involves complex interactions between memory and numeric values. This thesis
presents contributions to improve the precision and scalability of the
symbolic method for enabling practical static bug detection at an industrial
scale.
The first work focuses on addressing the imprecision of divide-by-zero
detection caused by under-constrained symbolic variables. Our key idea is to
gather extra evidence about the under-constrained variables by inferring the
programmers' beliefs, which helps the static analyzer to report bugs that are
more likely to be true. Our evidence-based inference approach greatly
outperformed existing solutions in precision: 72 divide-by-zero bugs were
uncovered in large codebases such as the Linux kernel with a low false
positive rate of 22%.
The second work addresses the scalability problem of the symbolic method for
handling intricate semantics of memory operations. Specifically, we study the
problem of static buffer overflow detection, where mutually dependent program
properties of heap and numeric value need to be precisely tracked at the same
time. We observe that the key is to handle symbolic heap locations, i.e.,
heap locations with possibly symbolic numerical offsets. By introducing a
disjointness assumption into the memory model, we enable strong updates
efficiently and support a summary-based modular analysis algorithm. While
existing works either cannot finish or produce an excessive number of false
positives, our approach can handle millions of lines of code in less than
four hours with a false positive rate of 37%.
Date: Monday, 25 November 2024
Time: 3:30pm - 5:30pm
Venue: Room 5501
Lifts 25/26
Committee Members: Prof. Charlesz Zhang (Supervisor)
Prof. Qiong Luo (Chairperson)
Dr. Dongdong She
Dr. Shuai Wang