More about HKUST
Practical Symbolic Static Analysis For Effective Bug Detection
PhD Thesis Proposal Defence Title: "Practical Symbolic Static Analysis For Effective Bug Detection" by Mr. Yiyuan GUO Abstract: Static analysis infers the behaviors of programs without executing them and is widely used to detect bugs automatically. Many static analysis techniques employ symbolic methods where symbolic variables are introduced for unknown values and the program state is expressed in symbolic formulas. However, we encounter great challenges in making the symbolic method effective for bug detection in realistic, million-line codebases. First, when targeting real-world software, the inherent limitation of static analysis often leads to the introduction of many under-constrained symbolic variables, causing significant imprecision. Second, the scalability of the symbolic method is greatly affected by its memory model, especially when the checked property involves complex interactions between memory and numeric values. This thesis presents contributions to improve the precision and scalability of the symbolic method for enabling practical static bug detection at an industrial scale. The first work focuses on addressing the imprecision of divide-by-zero detection caused by under-constrained symbolic variables. Our key idea is to gather extra evidence about the under-constrained variables by inferring the programmers' beliefs, which helps the static analyzer to report bugs that are more likely to be true. Our evidence-based inference approach greatly outperformed existing solutions in precision: 72 divide-by-zero bugs were uncovered in large codebases such as the Linux kernel with a low false positive rate of 22%. The second work addresses the scalability problem of the symbolic method for handling intricate semantics of memory operations. Specifically, we study the problem of static buffer overflow detection, where mutually dependent program properties of heap and numeric value need to be precisely tracked at the same time. We observe that the key is to handle symbolic heap locations, i.e., heap locations with possibly symbolic numerical offsets. By introducing a disjointness assumption into the memory model, we enable strong updates efficiently and support a summary-based modular analysis algorithm. While existing works either cannot finish or produce an excessive number of false positives, our approach can handle millions of lines of code in less than four hours with a false positive rate of 37%. Date: Monday, 25 November 2024 Time: 3:30pm - 5:30pm Venue: Room 5501 Lifts 25/26 Committee Members: Prof. Charlesz Zhang (Supervisor) Prof. Qiong Luo (Chairperson) Dr. Dongdong She Dr. Shuai Wang