More about HKUST
Fault Characterization and Testing for Domain-Specific Compilers
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
PhD Thesis Defence
Title: "Fault Characterization and Testing for Domain-Specific Compilers"
By
Mr. Haoyang MA
Abstract:
Compilers are critical software components that translate high-level
programming languages into executable machine code. Their correctness is
paramount, as errors in compilers can lead to compilation crashes and even
silent miscompilations, potentially introducing bugs in all software built
using the faulty compiler. Despite decades of development, modern compilers
still contain numerous bugs, with severe consequences ranging from program
crashes to security vulnerabilities.
The prevalence of compiler bugs poses a significant challenge to software
reliability and security. Even mature, widely-used compilers like GCC and
LLVM have been found to contain thousands of bugs, many of which can result
in the generation of incorrect code. These bugs are particularly insidious as
they can silently propagate errors into deployed applications, making them
difficult to detect and diagnose. This thesis addresses understanding and
detecting compiler bugs from the following three aspects.
The first study in this thesis focuses on understanding bugs in the Solidity
compiler, a critical component for developing smart contract applications on
Ethereum. This study conducts a systematic analysis of 533 Solidity compiler
bugs, examining their characteristics, including symptoms, root causes,
distribution, and bug-related code patterns. The comprehensive analysis leads
to seven key takeaways for revealing bugs in the Solidity compiler. To
evaluate the practical implications of these findings, the study constructs a
benchmark and assesses three existing Solidity compiler fuzzers. The results
demonstrate that these fuzzers are currently inefficient in detecting
Solidity compiler bugs, primarily due to their failure to consider
bug-inducing features, relevant compilation flags, and appropriate test
oracles. This study contributes to the field by providing a deep
understanding of Solidity compiler bugs and offering insights for developing
more effective testing tools.
The second study follows the findings from the first study, addressing the
limitations of existing compiler testing tools, particularly for Solidity.
This study introduces a novel approach called bounded exhaustive random
program generation, designed to focus on the search space of program
generation and more effectively identify bug-triggering programs in compilers
and analyzers. The approach consists of two key stages: first, generating
random program templates with bug-related placeholders, and second,
conducting a bounded exhaustive enumeration of valid values for these
placeholders. To maintain efficiency, the study employs a solvable constraint
set during template generation and systematically explores possible
placeholder values within these constraints during exhaustive enumeration.
This methodology was implemented in a tool named ERWIN, specifically for
testing Solidity compilers and analyzers. ERWIN successfully identified 23
previously unknown bugs across two Solidity compilers (solc and solang) and
one Solidity static analyzer (slither). The evaluation results demonstrate
ERWIN's superior performance compared to state-of-the-art Solidity fuzzers in
bug detection. Additionally, ERWIN complements developer-written test suites
by covering 4,582 edges and 14,737 lines of the solc compiler that were not
covered by solc's unit tests, highlighting its effectiveness in improving
compiler testing coverage.
The third study explores the application of a generation-based testing
approach to deep learning (DL) compilers. This study focuses on addressing
the challenges in testing the optimization of high-level intermediate
representations (IRs), which has been identified as the most error-prone
compilation stage in DL compilers. The study introduces HIRGEN, an automated
testing technique designed to effectively expose coding mistakes in the
optimization of high-level IRs. HIRGEN incorporates three key components: 1)
coverage criteria for generating diverse and valid computational graphs, 2)
utilization of high-level IR language features to produce diverse IRs, and 3)
three test oracles inspired by metamorphic testing and differential testing.
HIRGEN's effectiveness is demonstrated by its successful detection of 21 bugs
in TVM, a popular DL compiler, with 17 bugs confirmed and 12 fixed. The study
also compares HIRGEN's performance against four baselines constructed using
state-of-the-art DL compiler fuzzers. The results show that HIRGEN can detect
10 crashes and inconsistencies that the baselines failed to identify within a
48-hour testing period. Additionally, the study evaluates the usefulness of
the proposed coverage criteria and test oracles, providing insights into
their effectiveness in improving DL compiler testing.
Date: Tuesday, 2 December 2025
Time: 10:00am - 12:00noon
Venue: Room 4472
Lifts 25/26
Chairman: Prof. Min YAN (MATH)
Committee Members: Prof. Shing-Chi CHEUNG (Supervisor)
Dr. May FUNG
Dr. Lionel PARREAUX
Dr. Chao TANG (ACCT)
Prof. Jun SUN (SMU)