More about HKUST
Towards Automatic Testing and Fault Localization in Natural Language Processing Systems
PhD Thesis Proposal Defence Title: "Towards Automatic Testing and Fault Localization in Natural Language Processing Systems" by Mr. Jialun CAO Abstract: Natural Language Processing (NLP) systems such as machine translation and chatbots become pervasive in daily life. Despite their widespread applications, how to test and debug such systems is still an open question and attracts researchers from both academia and industry. However, unlike conventional software programs, NLP systems, whose core algorithm is the deep neural network (DNN) model, do not make decisions explicitly following the program's control flow. Instead, decisions are made by propagating inputs against the tuned parameters through the trained DNN model. The indirect correlation between the program source code and the resulting DNN impedes the adoption of conventional software testing and debugging approaches. In addition, the shortage of labeled datasets for testing also brings about challenges to automatic testing. While it is possible to feed arbitrary sentences as test inputs to an NLP system, a test oracle that captures their expected test outputs is hard to define automatically. This thesis aims to propose automatic approaches to generate test cases together with test oracles to detect defects in NLP systems, and further propose a fault localization framework to aid the debugging. It consists of the following three studies. The first study proposes a semantic-based testing approach for machine translation systems. Existing methodologies mostly rely on metamorphic relations designed at the textual level (e.g., Levenshtein distance) or syntactic level (e.g., the distance between grammar structures) to determine the correctness of translation results. However, these metamorphic relations do not consider whether the original and the translated sentences have the same meaning (i.e., semantic similarity). To this end, this study proposes an automatic testing approach for machine translation systems based on semantic similarity checking. The insight of this study is that the semantics concerning logical relations and quantifiers in sentences can be captured by regular expressions (or deterministic finite automata) where efficient semantic equivalence/similarity checking algorithms can be applied. By doing so, the accuracy and F-Score of mistranslation detection could be improved. The second study proposes a metamorphic testing approach to test coreference resolution (CR) systems. CR is a task to resolve different expressions (e.g., named entities, pronouns) that refer to the same real-world entity/ event. It is a core natural language processing (NLP) component that underlies and empowers major downstream NLP applications such as machine translation, chatbots, and question-answering. Despite its broad impact, the problem of testing CR systems has rarely been studied. A major difficulty is the shortage of a labeled dataset for testing. While it is possible to feed arbitrary sentences as test inputs to a CR system, a test oracle that captures their expected test outputs (coreference relations) is hard to define automatically. To address this problem, this study proposes a methodology to generate CR-preserving test cases. This is the first study that focuses on effectively generating test inputs that respect coreference, which complements existing works. The third work designs a learning-based fault diagnosis and localization framework for NLP programs. Although various studies have been conducted to understand and detect program faults, these existing studies identify and repair suspicious neurons on the trained DNN, which, unfortunately, might be a detour. Instead, locating the faulty statements in the programs can provide developers with more useful information for debugging. To achieve this goal, this study proposes a learning-based fault diagnosis and localization framework which maps the fault localization task to a learning problem. In particular, it infers the suspicious fault types via monitoring the runtime features extracted during DNN model training and then locates the diagnosed faults in NLP programs. The evaluation exhibits the potential of learningbased fault diagnosis for NLP programs. Date: Thursday, 31 August 2023 Time: 4:00pm - 6:00pm Venue: Room 3494 lifts 25/26 Committee Members: Prof. Shing-Chi Cheung (Supervisor) Dr. Minhao Cheng (Chairperson) Dr. Shuai Wang Dr. Dan Xu **** ALL are Welcome ****