More about HKUST
Visual Explanation of Black Box Algorithms on Multi-dimensional Data
PhD Thesis Proposal Defence Title: "Visual Explanation of Black Box Algorithms on Multi-dimensional Data" by Miss Xun ZHAO Abstract: As the explosively growing of available multi-dimensional data, many machine learning and data mining algorithms have been developed to analyze and utilize these data. However, most of these algorithms are black boxes, which hinders users from understanding and trusting the decisions made by these algorithms. By taking advantages of human's strong visual perception capability, visualization techniques can be utilized to facilitate the interpretation of these algorithms and their decisions. In this proposal, we propose several visualization techniques to tackle with various black box algorithms. In the first work, we focus on explaining skyline, which is widely applied to facilitate multi-criteria decision making. By automatically removing incompetent candidates, skyline queries allow users to focus on a subset of superior data items (i.e., the skyline). However, users are still required to interpret and compare these superior items manually before making a successful choice. We therefore propose SkyLens, a visual analytic system aiming at revealing the superiority of skyline points from different perspectives and at different scales to aid users in their decision making. Two usage scenarios and one user study are conducted to demonstrate the effectiveness of our system. The second work studies the explanation of random forest algorithms. As an ensemble model that consists of many independent decision trees, random forests generate predictions by feeding the input to internal trees and summarizing their outputs. However, random forests suffer from a poor model interpretability, which significantly hinders the model from being used in fields that require transparent and explainable predictions, such as medical diagnosis and financial fraud detection. To address this issue, we propose an interactive visualization system aiming at interpreting random forest models and predictions. We carried out two usage scenarios and one user study to evaluate the usefulness of the proposed technique. The third work investigates the interpretation of outliers, the data instances that do not conform with normal patterns in a dataset. As different domains usually have different considerations about outliers, understanding the defining characteristics of outliers is essential for users to select and filter appropriate outliers based on their domain requirements. However, most existing work focuses on the efficiency and accuracy of outlier detection, while neglecting the importance of outlier interpretation. Hence, we propose a visual analytic system that helps users understand, interpret, and select the outliers detected by various algorithms. Date: Thursday, 11 October 2018 Time: 2:00pm - 4:00pm Venue: Room 2408 (lifts 17/18) Committee Members: Prof. Dik-Lun Lee (Supervisor) Prof. Huamin Qu (Supervisor) Dr. Ke Yi (Chairperson) Dr. Raymond Wong **** ALL are Welcome ****