Explaining NLI with Feature Interaction Attribution

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering

Final Year Thesis Oral Defense

Title: "Explaining NLI with Feature Interaction Attribution"

By

CHOI Sehyun

Abstract:

Natural Language Inference (NLI) is an important task that gauges AI 
model's capabilities to understand logical relationship between sentences. 
Recent development of large-scale language models has brought great 
performance leap in this task, but at the same time, the increased 
complexity of model architecture induced the problem of low 
interpretability, hurting the trustworthiness of these models. While many 
explanatory methods were previously proposed for other text classification 
tasks, they are not well suited to explain the model's classification 
decision under the NLI task as they cannot highlight relationship between 
features.

This thesis proposes a novel explanatory framework for the NLI task by 
utilizing the tools of feature interaction attribution methods, where it 
attributes importance to the interaction of features, not just individual 
features. The framework extends this idea to focus on word interactions 
across the sentences to find the important sentence relationship cues used 
by the model. We also propose a new metric that could effectively evaluate 
the explanatory method's ability to capture cross-sentence feature 
relationships. The evaluation results of our method on the e-SNLI dataset 
shows significant improvement in explanation quality with our framework.


Date            : 7 May 2022 (Saturday)

Time            : 10:00-10:40

Zoom Link       : https://hkust.zoom.us/j/6761083097

Meeting ID      : 676 108 3097

Advisor         : Prof. ZHANG Nevin Lianwen

2nd Reader      : Dr. SONG Yangqiu