Fault Characterization and Testing for Domain-Specific Compilers

PhD Thesis Proposal Defence


Title: "Fault Characterization and Testing for Domain-Specific Compilers"

by

Mr. Haoyang MA


Abstract:

Compilers are critical software components that translate high-level 
programming languages into executable machine code. Their correctness is 
paramount, as errors in compilers can lead to compilation crashes and even 
silent miscompilations, potentially introducing bugs in all software built 
using the faulty compiler. Despite decades of development, modern compilers 
still contain numerous bugs, with severe consequences ranging from program 
crashes to security vulnerabilities.

The prevalence of compiler bugs poses a significant challenge to software 
reliability and security. Even mature, widely-used compilers like GCC and LLVM 
have been found to contain thousands of bugs, many of which can result in the 
generation of incorrect code. These bugs are particularly insidious as they 
can silently propagate errors into deployed applications, making them difficult 
to detect and diagnose. This thesis helps understand and detect compiler bugs 
in the following three aspects.

The first study in this thesis focuses on understanding bugs in the Solidity 
compiler, a critical component for developing smart contract applications on 
Ethereum. This study conducts a systematic analysis of 533 Solidity compiler 
bugs, examining their characteristics, including symptoms, root causes, 
distribution, and bug-related code patterns. The comprehensive analysis leads 
to seven key takeaways for revealing bugs in the Solidity compiler. To 
evaluate the practical implications of these findings, the study constructs a 
benchmark and assesses three existing Solidity compiler fuzzers. The results 
demonstrate that these fuzzers are currently inefficient in detecting Solidity 
compiler bugs, primarily due to their failure to consider bug-inducing 
features, relevant compilation flags, and appropriate test oracles. This study 
contributes to the field by providing a deep understanding of Solidity compiler 
bugs and offering insights for developing more effective testing tools.

The second study follows the findings from the first study, addressing the 
limitations of existing compiler testing tools, particularly for Solidity. 
This study introduces a novel approach called bounded exhaustive random 
program generation, designed to focus the search space of program generation 
and more effectively identify bug-triggering programs in compilers and 
analyzers. The approach consists of two key stages: first, generating random 
program templates with bug-related placeholders, and second, conducting a 
bounded exhaustive enumeration of valid values for these placeholders. To 
maintain efficiency, the study employs a solvable constraint set during 
template generation and systematically explores possible placeholder values 
within these constraints during exhaustive enumeration. This methodology was 
implemented in a tool named ERWIN, specifically for testing Solidity compilers 
and analyzers. ERWIN successfully identified 23 previously unknown bugs across 
two Solidity compilers (solc and solang) and one Solidity static analyzer 
(slither). The evaluation results demonstrate ERWIN’s superior performance 
compared to state-of-the-art Solidity fuzzers in bug detection. Additionally, 
ERWIN complements developer-written test suites by covering 4,582 edges and 
14,737 lines of the solc compiler that were not covered by solc’s unit tests, 
highlighting its effectiveness in improving compiler testing coverage.

The third study explores the application of a generation-based testing 
approach to deep learning (DL) compilers. This study focuses on addressing the 
challenges in testing the optimization of high-level intermediate 
representations (IRs), which has been identified as the most error-prone 
compilation stage in DL compilers. The study introduces HIRGEN, an automated 
testing technique designed to effectively expose coding mistakes in the 
optimization of high-level IRs. HIRGEN incorporates three key components: 1) 
coverage criteria for generating diverse and valid computational graphs, 2) 
utilization of high-level IR language features to produce diverse IRs, and 3) 
three test oracles inspired by metamorphic testing and differential testing. 
HIRGEN’s effectiveness is demonstrated by its successful detection of 21 bugs 
in TVM, a popular DL compiler, with 17 bugs confirmed and 12 fixed. The study 
also compares HIRGEN’s performance against four baselines constructed using 
state-of-the-art DL compiler fuzzers. The results show that HIRGEN can detect 
10 crashes and inconsistencies that the baselines failed to identify within a 
48-hour testing period. Additionally, the study evaluates the usefulness of 
the proposed coverage criteria and test oracles, providing insights into their 
effectiveness in improving DL compiler testing.


Date:                   Wednesday, 18 June 2025

Time:                   2:00pm - 4:00pm

Venue:                  Room 3494
                        Lifts 25/26

Committee Members:      Prof. Shing-Chi Cheung (Supervisor)
                        Prof. Raymond Wong (Chairperson)
                        Dr. Shuai Wang