More about HKUST
Effective Testing of Functional Faults for Deep Learning Libraries
PhD Thesis Proposal Defence Title: "Effective Testing of Functional Faults for Deep Learning Libraries" by Mr. Meiziniu LI Abstract: Deep learning (DL) is increasingly being adopted for mission-critical applications such as authentication, medical treatment, and autonomous driving. Modern DL applications mostly rely on popular DL libraries like PyTorch and TensorFlow. However, the presence of functional incorrectness in these libraries, commonly referred to as functional bugs or faults, poses a significant threat to the reliability of DL applications. For instance, over 30% of high-priority issues reported to PyTorch developers are categorized as functional faults (e.g., incorrect computation results or execution states). Therefore, detecting functional faults is critical for ensuring the quality of DL libraries. Nevertheless, automated detection of functional faults in DL libraries faces two major challenges: (1) The complex internal logic of DL libraries makes it nontrivial to generate diverse test inputs that effectively cover diverse program paths and trigger functional faults; (2) The lack of effective, or even absent, test oracles often prevents triggered faults from being detected. This thesis addresses these two challenges through three aspects, aiming for more effective functional fault detection in DL libraries. The first study proposes COMET, a model-level testing framework. COMET generates diverse DL models guided by a novel coverage criterion and mitigates the oracle challenge by cross-library differential testing of model inference results. While effective, model-level testing is inherently constrained by the limited number of operators tested, leaving many individual DL operators inadequately tested. To overcome this limitation, our second technique, DLLens, focuses on operator-level testing. DLLens employs LLM-aided static analysis to extract path constraints within DL operators for generating diverse test inputs. It then addresses the test oracle problem by enhancing differential testing with LLMs to identify functionally equivalent counterparts for the operator under test. Finally, to systematically understand the characteristics of functional bugs in DL libraries, we conducted an empirical study characterizing the fault-triggering conditions, buggy stages, and root causes of silent errors in torch.compileāa widely used DL infrastructure in modern large language model (LLM) applications. The findings provide insights for generating effective test inputs to trigger functional faults in torch.compile. Collectively, these contributions significantly advance the state-of-the-art in functional fault detection for DL libraries by addressing the core challenges of test input generation and test oracle design. Date: Monday, 27 October 2025 Time: 4:00pm - 6:00pm Venue: Room 2408 Lifts 17/18 Committee Members: Prof. Shing-Chi Cheung (Supervisor) Prof. Raymond Wong (Chairperson) Dr. Shuai Wang