More about HKUST
Visual Explanation of Black Box Algorithms on Multi-dimensional Data
PhD Thesis Proposal Defence
Title: "Visual Explanation of Black Box Algorithms on Multi-dimensional Data"
by
Miss Xun ZHAO
Abstract:
As the explosively growing of available multi-dimensional data, many machine
learning and data mining algorithms have been developed to analyze and utilize
these data. However, most of these algorithms are black boxes, which hinders
users from understanding and trusting the decisions made by these algorithms.
By taking advantages of human's strong visual perception capability,
visualization techniques can be utilized to facilitate the interpretation of
these algorithms and their decisions. In this proposal, we propose several
visualization techniques to tackle with various black box algorithms.
In the first work, we focus on explaining skyline, which is widely applied
to facilitate multi-criteria decision making. By automatically removing
incompetent candidates, skyline queries allow users to focus on a subset
of superior data items (i.e., the skyline). However, users are still
required to interpret and compare these superior items manually before
making a successful choice. We therefore propose SkyLens, a visual
analytic system aiming at revealing the superiority of skyline points from
different perspectives and at different scales to aid users in their
decision making. Two usage scenarios and one user study are conducted to
demonstrate the effectiveness of our system. The second work studies the
explanation of random forest algorithms. As an ensemble model that
consists of many independent decision trees, random forests generate
predictions by feeding the input to internal trees and summarizing their
outputs. However, random forests suffer from a poor model
interpretability, which significantly hinders the model from being used in
fields that require transparent and explainable predictions, such as
medical diagnosis and financial fraud detection. To address this issue, we
propose an interactive visualization system aiming at interpreting random
forest models and predictions. We carried out two usage scenarios and one
user study to evaluate the usefulness of the proposed technique.
The third work investigates the interpretation of outliers, the data instances
that do not conform with normal patterns in a dataset. As different domains
usually have different considerations about outliers, understanding the
defining characteristics of outliers is essential for users to select and
filter appropriate outliers based on their domain requirements. However, most
existing work focuses on the efficiency and accuracy of outlier detection,
while neglecting the importance of outlier interpretation. Hence, we propose a
visual analytic system that helps users understand, interpret, and select the
outliers detected by various algorithms.
Date: Thursday, 11 October 2018
Time: 2:00pm - 4:00pm
Venue: Room 2408
(lifts 17/18)
Committee Members: Prof. Dik-Lun Lee (Supervisor)
Prof. Huamin Qu (Supervisor)
Dr. Ke Yi (Chairperson)
Dr. Raymond Wong
**** ALL are Welcome ****