More about HKUST
Fault Characterization and Testing for Domain-Specific Compilers
PhD Thesis Proposal Defence Title: "Fault Characterization and Testing for Domain-Specific Compilers" by Mr. Haoyang MA Abstract: Compilers are critical software components that translate high-level programming languages into executable machine code. Their correctness is paramount, as errors in compilers can lead to compilation crashes and even silent miscompilations, potentially introducing bugs in all software built using the faulty compiler. Despite decades of development, modern compilers still contain numerous bugs, with severe consequences ranging from program crashes to security vulnerabilities. The prevalence of compiler bugs poses a significant challenge to software reliability and security. Even mature, widely-used compilers like GCC and LLVM have been found to contain thousands of bugs, many of which can result in the generation of incorrect code. These bugs are particularly insidious as they can silently propagate errors into deployed applications, making them difficult to detect and diagnose. This thesis helps understand and detect compiler bugs in the following three aspects. The first study in this thesis focuses on understanding bugs in the Solidity compiler, a critical component for developing smart contract applications on Ethereum. This study conducts a systematic analysis of 533 Solidity compiler bugs, examining their characteristics, including symptoms, root causes, distribution, and bug-related code patterns. The comprehensive analysis leads to seven key takeaways for revealing bugs in the Solidity compiler. To evaluate the practical implications of these findings, the study constructs a benchmark and assesses three existing Solidity compiler fuzzers. The results demonstrate that these fuzzers are currently inefficient in detecting Solidity compiler bugs, primarily due to their failure to consider bug-inducing features, relevant compilation flags, and appropriate test oracles. This study contributes to the field by providing a deep understanding of Solidity compiler bugs and offering insights for developing more effective testing tools. The second study follows the findings from the first study, addressing the limitations of existing compiler testing tools, particularly for Solidity. This study introduces a novel approach called bounded exhaustive random program generation, designed to focus the search space of program generation and more effectively identify bug-triggering programs in compilers and analyzers. The approach consists of two key stages: first, generating random program templates with bug-related placeholders, and second, conducting a bounded exhaustive enumeration of valid values for these placeholders. To maintain efficiency, the study employs a solvable constraint set during template generation and systematically explores possible placeholder values within these constraints during exhaustive enumeration. This methodology was implemented in a tool named ERWIN, specifically for testing Solidity compilers and analyzers. ERWIN successfully identified 23 previously unknown bugs across two Solidity compilers (solc and solang) and one Solidity static analyzer (slither). The evaluation results demonstrate ERWIN’s superior performance compared to state-of-the-art Solidity fuzzers in bug detection. Additionally, ERWIN complements developer-written test suites by covering 4,582 edges and 14,737 lines of the solc compiler that were not covered by solc’s unit tests, highlighting its effectiveness in improving compiler testing coverage. The third study explores the application of a generation-based testing approach to deep learning (DL) compilers. This study focuses on addressing the challenges in testing the optimization of high-level intermediate representations (IRs), which has been identified as the most error-prone compilation stage in DL compilers. The study introduces HIRGEN, an automated testing technique designed to effectively expose coding mistakes in the optimization of high-level IRs. HIRGEN incorporates three key components: 1) coverage criteria for generating diverse and valid computational graphs, 2) utilization of high-level IR language features to produce diverse IRs, and 3) three test oracles inspired by metamorphic testing and differential testing. HIRGEN’s effectiveness is demonstrated by its successful detection of 21 bugs in TVM, a popular DL compiler, with 17 bugs confirmed and 12 fixed. The study also compares HIRGEN’s performance against four baselines constructed using state-of-the-art DL compiler fuzzers. The results show that HIRGEN can detect 10 crashes and inconsistencies that the baselines failed to identify within a 48-hour testing period. Additionally, the study evaluates the usefulness of the proposed coverage criteria and test oracles, providing insights into their effectiveness in improving DL compiler testing. Date: Wednesday, 18 June 2025 Time: 2:00pm - 4:00pm Venue: Room 3494 Lifts 25/26 Committee Members: Prof. Shing-Chi Cheung (Supervisor) Prof. Raymond Wong (Chairperson) Dr. Shuai Wang