Fault Characterization and Testing for Domain-Specific Compilers

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Fault Characterization and Testing for Domain-Specific Compilers"

By

Mr. Haoyang MA


Abstract:

Compilers are critical software components that translate high-level 
programming languages into executable machine code. Their correctness is 
paramount, as errors in compilers can lead to compilation crashes and even 
silent miscompilations, potentially introducing bugs in all software built 
using the faulty compiler. Despite decades of development, modern compilers 
still contain numerous bugs, with severe consequences ranging from program 
crashes to security vulnerabilities.

The prevalence of compiler bugs poses a significant challenge to software 
reliability and security. Even mature, widely-used compilers like GCC and 
LLVM have been found to contain thousands of bugs, many of which can result 
in the generation of incorrect code. These bugs are particularly insidious as 
they can silently propagate errors into deployed applications, making them 
difficult to detect and diagnose. This thesis addresses understanding and 
detecting compiler bugs from the following three aspects.

The first study in this thesis focuses on understanding bugs in the Solidity 
compiler, a critical component for developing smart contract applications on 
Ethereum. This study conducts a systematic analysis of 533 Solidity compiler 
bugs, examining their characteristics, including symptoms, root causes, 
distribution, and bug-related code patterns. The comprehensive analysis leads 
to seven key takeaways for revealing bugs in the Solidity compiler. To 
evaluate the practical implications of these findings, the study constructs a 
benchmark and assesses three existing Solidity compiler fuzzers. The results 
demonstrate that these fuzzers are currently inefficient in detecting 
Solidity compiler bugs, primarily due to their failure to consider 
bug-inducing features, relevant compilation flags, and appropriate test 
oracles. This study contributes to the field by providing a deep 
understanding of Solidity compiler bugs and offering insights for developing 
more effective testing tools.

The second study follows the findings from the first study, addressing the 
limitations of existing compiler testing tools, particularly for Solidity. 
This study introduces a novel approach called bounded exhaustive random 
program generation, designed to focus on the search space of program 
generation and more effectively identify bug-triggering programs in compilers 
and analyzers. The approach consists of two key stages: first, generating 
random program templates with bug-related placeholders, and second, 
conducting a bounded exhaustive enumeration of valid values for these 
placeholders. To maintain efficiency, the study employs a solvable constraint 
set during template generation and systematically explores possible 
placeholder values within these constraints during exhaustive enumeration. 
This methodology was implemented in a tool named ERWIN, specifically for 
testing Solidity compilers and analyzers. ERWIN successfully identified 23 
previously unknown bugs across two Solidity compilers (solc and solang) and 
one Solidity static analyzer (slither). The evaluation results demonstrate 
ERWIN's superior performance compared to state-of-the-art Solidity fuzzers in 
bug detection. Additionally, ERWIN complements developer-written test suites 
by covering 4,582 edges and 14,737 lines of the solc compiler that were not 
covered by solc's unit tests, highlighting its effectiveness in improving 
compiler testing coverage.

The third study explores the application of a generation-based testing 
approach to deep learning (DL) compilers. This study focuses on addressing 
the challenges in testing the optimization of high-level intermediate 
representations (IRs), which has been identified as the most error-prone 
compilation stage in DL compilers. The study introduces HIRGEN, an automated 
testing technique designed to effectively expose coding mistakes in the 
optimization of high-level IRs. HIRGEN incorporates three key components: 1) 
coverage criteria for generating diverse and valid computational graphs, 2) 
utilization of high-level IR language features to produce diverse IRs, and 3) 
three test oracles inspired by metamorphic testing and differential testing. 
HIRGEN's effectiveness is demonstrated by its successful detection of 21 bugs 
in TVM, a popular DL compiler, with 17 bugs confirmed and 12 fixed. The study 
also compares HIRGEN's performance against four baselines constructed using 
state-of-the-art DL compiler fuzzers. The results show that HIRGEN can detect 
10 crashes and inconsistencies that the baselines failed to identify within a 
48-hour testing period. Additionally, the study evaluates the usefulness of 
the proposed coverage criteria and test oracles, providing insights into 
their effectiveness in improving DL compiler testing.


Date:                   Tuesday, 2 December 2025

Time:                   10:00am - 12:00noon

Venue:                  Room 4472
                        Lifts 25/26

Chairman:               Prof. Min YAN (MATH)

Committee Members:      Prof. Shing-Chi CHEUNG (Supervisor)
                        Dr. May FUNG
                        Dr. Lionel PARREAUX
                        Dr. Chao TANG (ACCT)
                        Prof. Jun SUN (SMU)