Effective Testing of Functional Faults for Deep Learning Libraries

PhD Thesis Proposal Defence


Title: "Effective Testing of Functional Faults for Deep Learning Libraries"

by

Mr. Meiziniu LI


Abstract:

Deep learning (DL) is increasingly being adopted for mission-critical
applications such as authentication, medical treatment, and autonomous
driving. Modern DL applications mostly rely on popular DL libraries like
PyTorch and TensorFlow. However, the presence of functional incorrectness in
these libraries, commonly referred to as functional bugs or faults, poses a
significant threat to the reliability of DL applications. For instance, over
30% of high-priority issues reported to PyTorch developers are categorized as
functional faults (e.g., incorrect computation results or execution states).
Therefore, detecting functional faults is critical for ensuring the quality
of DL libraries.

Nevertheless, automated detection of functional faults in DL libraries faces 
two major challenges: (1) The complex internal logic of DL libraries makes it 
nontrivial to generate diverse test inputs that effectively cover diverse 
program paths and trigger functional faults; (2) The lack of effective, or 
even absent, test oracles often prevents triggered faults from being 
detected. This thesis addresses these two challenges through three aspects, 
aiming for more effective functional fault detection in DL libraries.

The first study proposes COMET, a model-level testing framework. COMET
generates diverse DL models guided by a novel coverage criterion and
mitigates the oracle challenge by cross-library differential testing of model
inference results. While effective, model-level testing is inherently
constrained by the limited number of operators tested, leaving many
individual DL operators inadequately tested. To overcome this limitation, our
second technique, DLLens, focuses on operator-level testing. DLLens employs
LLM-aided static analysis to extract path constraints within DL operators for
generating diverse test inputs. It then addresses the test oracle problem by
enhancing differential testing with LLMs to identify functionally equivalent
counterparts for the operator under test. Finally, to systematically
understand the characteristics of functional bugs in DL libraries, we
conducted an empirical study characterizing the fault-triggering conditions,
buggy stages, and root causes of silent errors in torch.compile—a widely used
DL infrastructure in modern large language model (LLM) applications. The
findings provide insights for generating effective test inputs to trigger
functional faults in torch.compile.

Collectively, these contributions significantly advance the state-of-the-art
in functional fault detection for DL libraries by addressing the core
challenges of test input generation and test oracle design.


Date:                   Monday, 27 October 2025

Time:                   4:00pm - 6:00pm

Venue:                  Room 2408
                        Lifts 17/18

Committee Members:      Prof. Shing-Chi Cheung (Supervisor)
                        Prof. Raymond Wong (Chairperson)
                        Dr. Shuai Wang