More about HKUST
Assessing the Reliability of Deep Learning Applications
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
PhD Thesis Defence
Title: "Assessing the Reliability of Deep Learning Applications"
By
Mr. Yongqiang TIAN
Abstract:
Deep Learning (DL) applications are widely deployed in diverse areas, such
as image classification, natural language processing, and auto-driving
systems. Although these applications achieve outstanding performance in
certain metrics like accuracy, developers have raised strong concerns
about their reliability since the logic of DL applications is a black box
for humans. Specifically, DL applications learn their logic during
stochastic training and encode it in high-dimensional weights of DL
models. Unlike source code in conventional software, such weights are
infeasible for humans to directly interpret, examine, and validate. As a
result, the reliability issues in DL applications are not easy to detect
and may cause catastrophic accidents in safety-critical missions.
Therefore, it is critical to adequately assess the reliability of DL
applications.
This thesis aims to help software developers assess the reliability of DL
applications from the following three perspectives.
The first study proposes object-relevancy, a property that reliable
DL-based image classifiers should comply with, i.e., the classification
results should be made based on the features relevant to the target object
in a given image, instead of irrelevant features such as the background.
This study further proposes an automatic approach based on two metamorphic
relations to assess if this property is violated in the image
classifications. The evaluation shows that the proposed approach can
effectively detect unreliable inferences violating the object-relevancy
property, with an average precision of 64.1% and 96.4% for the two
relations, respectively. The subsequent empirical study reveals that such
unreliable inferences are prevalent in the real world and the existing
training strategies cannot tackle this issue effectively.
The second study concentrates on the reliability issues induced by DL
model compression. DL model compression can significantly reduce the sizes
of Deep Neural Network (DNN) models, and thus facilitate the deployment of
sophisticated, sizable DNN models. However, the prediction results of
compressed models may deviate from those of their original models,
resulting in unreliably deployed DL applications. To help developers
thoroughly assess the impact of model compression, it is essential to test
these models to find any deviated behaviors before dissemination. This
study proposes DFLARE, a novel, search-based, black-box testing technique.
The evaluation shows that DFLARE constantly outperforms the baseline in
both efficacy and efficiency. More importantly, the triggering inputs
found by DFLARE can be used to repair up to 48.48% of deviated behaviors.
The third study reveals the unreliable assessment of DL-based Program
Generators (DLGs) in compiler testing. To effectively test compilers, DLGs
are proposed to automatically generate massive testing programs. However,
after thorough analysis of the characteristics of DLGs, this study found
that the assessment of these DLGs is unfair and unreliable, since the
chosen baselines, i.e., Language-Specific Program Generators (LSGs), are
different from DLGs in many aspects. Furthermore, this study proposed
Kitten, a simple, fair, and non-DL-based baseline for DLGs. The
experiments show that DLGs cannot even compete against such a simple
baseline and the claimed advantages of DLGs are likely due to the biased
selection of the baseline. Specifically, Kitten triggers 1,750 hang bugs
and 34 distinct crashes in 72-hours of testing on GCC, while the
the-state-of-art DLG only triggers 3 hang bugs and 1 distinct crash.
Moreover, the code coverage achieved by Kitten is at least 2x as of that
achieved by the the-state-of-art DLG.
Date: Friday, 14 July 2023
Time: 9:00am - 11:00am
Venue: Room 6538
Lifts 27/28
Chairman: Prof. Hui SU (CIVL)
Committee Members: Prof. Shing Chi CHEUNG (Supervisor)
Prof. Chengnian SUN (Supervisor, U of Waterloo)
Prof. Raymond WONG
Prof. Ross MURCH (ECE)
Prof. Meng XU (U of Waterloo)
Prof. Zhi JIN (Peking University)
**** ALL are Welcome ****