Interactive Visual Analytics for Understanding and Facilitating Human Communication

PhD Thesis Proposal Defence


Title: "Interactive Visual Analytics for Understanding and Facilitating 
Human Communication"

by

Mr. Xingbo WANG


Abstract:

People communicate with each other through verbal and non-verbal behavior, 
including voice, words, facial expression, and body language. Interpreting 
human communication behavior has great value for many applications, such 
as business, healthcare, and education. For example, if students show 
signs of boredom or confusion during the courses, teachers can adjust the 
teaching methods to improve students’ engagement. With the rapid 
development of digital and sensing technology, human communication data is 
collected in various formats (e.g., video recordings, speech, and language 
corpus). To facilitate the analysis of human communication, researchers 
adopt computational approaches to quantify human behavior with multimodal 
features. However, it is still demanding and inefficient to manually 
extract insights (e.g., social meanings of the features) in the large and 
complex feature space. Furthermore, it remains challenging to utilize the 
knowledge distilled from the computational features to support effective 
human communication. Meanwhile, interactive visual analytics combines 
computational algorithms with interactive visualization to effectively 
supports information representation, pattern discovery, and decision 
making. It demonstrates great potential to solve the challenges above.

In this thesis, we design and build novel interactive visual analytics 
systems to 1) help domain experts discover valuable behavioral patterns in 
complex human communication data and 2) further provide end-users with 
visual guidance to improve their communication skills. In the first work, 
we present DeHumor, a visual analytics system that visually decomposes 
humor speeches into quantifiable multimodal features and enables domain 
experts to systematically explore humorous verbal content and vocal 
delivery. In the second work, we further characterize and investigate the 
intra- and inter-modal interactions between visual, acoustic, and language 
modalities, including dominance, complement, and conflict. Then, we 
develop M2Lens, a visual analytics system that helps model developers and 
users conduct multi-level and multi-faceted exploration of the influences 
of individual modalities and their interplay on model predictions for 
multimodal sentiment analysis. Besides understanding human communication 
behavior, in the third work, we present VoiceCoach, a visual analytics 
system that can evaluate speakers’ voice modulation skills regarding 
volume, pitch, speed, and pause, and recommend good learning examples of 
voice modulation in TED Talks to follow. Moreover, during the practice, 
the system can provide immediate visual feedback to speakers for 
self-awareness and performance improvement. Finally, we introduce an 
ongoing work that focuses on interactive story-based vocabulary learning 
powered by language models. We aim to build an interactive visual 
analytics system that integrates three story-based learning strategies for 
students to learn users’ specified English words, including reading a 
machine-generated story, completing a story cloze test, and taking turns 
with the machine to co-write a story using all the target words.


Date:			Monday, 30 May 2022

Time:                  	2:00pm - 4:00pm

Zoom Meeting: 		https://hkust.zoom.us/j/4210096111

Committee Members:	Prof. Huamin Qu (Supervisor)
  			Prof. Nevin Zhang (Chairperson)
 			Prof. Qiong Luo
 			Dr. Xiaojuan Ma


**** ALL are Welcome ****