More about HKUST
Fault Characterization and Testing for Domain-Specific Compilers
PhD Thesis Proposal Defence
Title: "Fault Characterization and Testing for Domain-Specific Compilers"
by
Mr. Haoyang MA
Abstract:
Compilers are critical software components that translate high-level
programming languages into executable machine code. Their correctness is
paramount, as errors in compilers can lead to compilation crashes and even
silent miscompilations, potentially introducing bugs in all software built
using the faulty compiler. Despite decades of development, modern compilers
still contain numerous bugs, with severe consequences ranging from program
crashes to security vulnerabilities.
The prevalence of compiler bugs poses a significant challenge to software
reliability and security. Even mature, widely-used compilers like GCC and LLVM
have been found to contain thousands of bugs, many of which can result in the
generation of incorrect code. These bugs are particularly insidious as they
can silently propagate errors into deployed applications, making them difficult
to detect and diagnose. This thesis helps understand and detect compiler bugs
in the following three aspects.
The first study in this thesis focuses on understanding bugs in the Solidity
compiler, a critical component for developing smart contract applications on
Ethereum. This study conducts a systematic analysis of 533 Solidity compiler
bugs, examining their characteristics, including symptoms, root causes,
distribution, and bug-related code patterns. The comprehensive analysis leads
to seven key takeaways for revealing bugs in the Solidity compiler. To
evaluate the practical implications of these findings, the study constructs a
benchmark and assesses three existing Solidity compiler fuzzers. The results
demonstrate that these fuzzers are currently inefficient in detecting Solidity
compiler bugs, primarily due to their failure to consider bug-inducing
features, relevant compilation flags, and appropriate test oracles. This study
contributes to the field by providing a deep understanding of Solidity compiler
bugs and offering insights for developing more effective testing tools.
The second study follows the findings from the first study, addressing the
limitations of existing compiler testing tools, particularly for Solidity.
This study introduces a novel approach called bounded exhaustive random
program generation, designed to focus the search space of program generation
and more effectively identify bug-triggering programs in compilers and
analyzers. The approach consists of two key stages: first, generating random
program templates with bug-related placeholders, and second, conducting a
bounded exhaustive enumeration of valid values for these placeholders. To
maintain efficiency, the study employs a solvable constraint set during
template generation and systematically explores possible placeholder values
within these constraints during exhaustive enumeration. This methodology was
implemented in a tool named ERWIN, specifically for testing Solidity compilers
and analyzers. ERWIN successfully identified 23 previously unknown bugs across
two Solidity compilers (solc and solang) and one Solidity static analyzer
(slither). The evaluation results demonstrate ERWIN’s superior performance
compared to state-of-the-art Solidity fuzzers in bug detection. Additionally,
ERWIN complements developer-written test suites by covering 4,582 edges and
14,737 lines of the solc compiler that were not covered by solc’s unit tests,
highlighting its effectiveness in improving compiler testing coverage.
The third study explores the application of a generation-based testing
approach to deep learning (DL) compilers. This study focuses on addressing the
challenges in testing the optimization of high-level intermediate
representations (IRs), which has been identified as the most error-prone
compilation stage in DL compilers. The study introduces HIRGEN, an automated
testing technique designed to effectively expose coding mistakes in the
optimization of high-level IRs. HIRGEN incorporates three key components: 1)
coverage criteria for generating diverse and valid computational graphs, 2)
utilization of high-level IR language features to produce diverse IRs, and 3)
three test oracles inspired by metamorphic testing and differential testing.
HIRGEN’s effectiveness is demonstrated by its successful detection of 21 bugs
in TVM, a popular DL compiler, with 17 bugs confirmed and 12 fixed. The study
also compares HIRGEN’s performance against four baselines constructed using
state-of-the-art DL compiler fuzzers. The results show that HIRGEN can detect
10 crashes and inconsistencies that the baselines failed to identify within a
48-hour testing period. Additionally, the study evaluates the usefulness of
the proposed coverage criteria and test oracles, providing insights into their
effectiveness in improving DL compiler testing.
Date: Wednesday, 18 June 2025
Time: 2:00pm - 4:00pm
Venue: Room 3494
Lifts 25/26
Committee Members: Prof. Shing-Chi Cheung (Supervisor)
Prof. Raymond Wong (Chairperson)
Dr. Shuai Wang