Visual Explanation of Black Box Algorithms on Multi-dimensional Data

PhD Thesis Proposal Defence


Title: "Visual Explanation of Black Box Algorithms on Multi-dimensional Data"

by

Miss Xun ZHAO


Abstract:

As the explosively growing of available multi-dimensional data, many machine 
learning and data mining algorithms have been developed to analyze and utilize 
these data. However, most of these algorithms are black boxes, which hinders 
users from understanding and trusting the decisions made by these algorithms. 
By taking advantages of human's strong visual perception capability, 
visualization techniques can be utilized to facilitate the interpretation of 
these algorithms and their decisions. In this proposal, we propose several 
visualization techniques to tackle with various black box algorithms.

In the first work, we focus on explaining skyline, which is widely applied 
to facilitate multi-criteria decision making. By automatically removing 
incompetent candidates, skyline queries allow users to focus on a subset 
of superior data items (i.e., the skyline). However, users are still 
required to interpret and compare these superior items manually before 
making a successful choice. We therefore propose SkyLens, a visual 
analytic system aiming at revealing the superiority of skyline points from 
different perspectives and at different scales to aid users in their 
decision making. Two usage scenarios and one user study are conducted to 
demonstrate the effectiveness of our system. The second work studies the 
explanation of random forest algorithms. As an ensemble model that 
consists of many independent decision trees, random forests generate 
predictions by feeding the input to internal trees and summarizing their 
outputs. However, random forests suffer from a poor model 
interpretability, which significantly hinders the model from being used in 
fields that require transparent and explainable predictions, such as 
medical diagnosis and financial fraud detection. To address this issue, we 
propose an interactive visualization system aiming at interpreting random 
forest models and predictions. We carried out two usage scenarios and one 
user study to evaluate the usefulness of the proposed technique.

The third work investigates the interpretation of outliers, the data instances 
that do not conform with normal patterns in a dataset. As different domains 
usually have different considerations about outliers, understanding the 
defining characteristics of outliers is essential for users to select and 
filter appropriate outliers based on their domain requirements. However, most 
existing work focuses on the efficiency and accuracy of outlier detection, 
while neglecting the importance of outlier interpretation. Hence, we propose a 
visual analytic system that helps users understand, interpret, and select the 
outliers detected by various algorithms.


Date:			Thursday, 11 October 2018

Time:                  	2:00pm - 4:00pm

Venue:                  Room 2408
                         (lifts 17/18)

Committee Members:	Prof. Dik-Lun Lee (Supervisor)
 			Prof. Huamin Qu (Supervisor)
 			Dr. Ke Yi (Chairperson)
 			Dr. Raymond Wong


**** ALL are Welcome ****