The Hong Kong University of Science and Technology
Department of Computer Science and Engineering

PhD Thesis Defence



Mr. Yuanzhe CHEN


Temporal event sequences are becoming increasingly important in many 
application domains such as website click streams, user interaction logs, 
electronic health records and car service records. However, a real-world 
dataset with a large number of event sequences of varying lengths is 
complex and difficult to analyze. Visual analytics has been proven as an 
effective approach to understanding such large amounts of data. For 
example, by visually highlighting the common behaviors of website click 
streams, usability issues and user behavior patterns can be identified to 
inform better designs of the interface. In this thesis, we follow the 
research in the area of event sequence visualization and report three 
works in developing visual analytics techniques for temporal event data 
from various application domains.

In the first work, we propose a novel visualization technique based on the 
minimum description length (MDL) principle to construct a coarse-level 
overview of event sequence data while balancing the information loss in 
it. The method addresses a fundamental trade-off in visualization design: 
reducing visual clutter vs. increasing the information content in a 
visualization. The method enables simultaneous sequence clustering and 
pattern extraction and is highly tolerant to noises such as missing or 
additional events in the data. Based on this approach we propose a visual 
analytics framework with multiple levels-of-detail to facilitate 
interactive data exploration. We demonstrate the usability and 
effectiveness of our approach through case studies with two real-world 
datasets. One dataset showcases a new application domain for event 
sequence visualization, i.e., fault development path analysis in vehicles 
for predictive maintenance. We also discuss the strengths and limitations 
of the proposed method based on user feedback.

The second work focus on the stage, that is, a frequently occurring 
subsequence in the dataset. We introduce a novel visualization technique 
to summarize event sequence data into a set of stage progression patterns. 
The resulting overview is more concise compared with event-level 
summarization and supports level-of-detail exploration. We further present 
a visual analytics system with four linked views, which are stage view, 
overview, tree view and sequences view to help users explore the data. We 
also present quantitative experimental results as well as case studies 
where the system is used in two different domains and discuss advantages 
and limitations of applying StageMap to various application scenarios.

In the third work, we study the temporal event data related to a specific 
application domain, i.e., the web click streams in Massive Open Online 
Courses (MOOCs). To be more specific, we try to understand the dropout 
behavior in such data. To tackle this problem, we introduce a 
comprehensive visual analytics system which not only helps instructors and 
education experts understand the reasons for the dropout, but also allows 
researchers to identify crucial features which can further improve the 
performance of the models. Both the heterogeneous data extracted from 
three different kinds of learner activity logs (i.e., clickstream, forum 
posts and homework records) and the predicted results are visualized in 
the proposed system.

Date:			Friday, 5 October 2018

Time:			2:30pm - 4:30pm

Venue:			Room 2408
 			Lifts 17/18

Chairman:		Prof. Kevin Chen (ECE)

Committee Members:	Prof. Huamin Qu (Supervisor)
 			Prof. Long Quan
 			Prof. Ke Yi
 			Prof. Weichuan Yu (ECE)
 			Prof. Jinwook Seo (Seoul National U)

**** ALL are Welcome ****