More about HKUST
Interactive Visual Analytics for Understanding and Facilitating Human Communication
PhD Thesis Proposal Defence Title: "Interactive Visual Analytics for Understanding and Facilitating Human Communication" by Mr. Xingbo WANG Abstract: People communicate with each other through verbal and non-verbal behavior, including voice, words, facial expression, and body language. Interpreting human communication behavior has great value for many applications, such as business, healthcare, and education. For example, if students show signs of boredom or confusion during the courses, teachers can adjust the teaching methods to improve students’ engagement. With the rapid development of digital and sensing technology, human communication data is collected in various formats (e.g., video recordings, speech, and language corpus). To facilitate the analysis of human communication, researchers adopt computational approaches to quantify human behavior with multimodal features. However, it is still demanding and inefficient to manually extract insights (e.g., social meanings of the features) in the large and complex feature space. Furthermore, it remains challenging to utilize the knowledge distilled from the computational features to support effective human communication. Meanwhile, interactive visual analytics combines computational algorithms with interactive visualization to effectively supports information representation, pattern discovery, and decision making. It demonstrates great potential to solve the challenges above. In this thesis, we design and build novel interactive visual analytics systems to 1) help domain experts discover valuable behavioral patterns in complex human communication data and 2) further provide end-users with visual guidance to improve their communication skills. In the first work, we present DeHumor, a visual analytics system that visually decomposes humor speeches into quantifiable multimodal features and enables domain experts to systematically explore humorous verbal content and vocal delivery. In the second work, we further characterize and investigate the intra- and inter-modal interactions between visual, acoustic, and language modalities, including dominance, complement, and conflict. Then, we develop M2Lens, a visual analytics system that helps model developers and users conduct multi-level and multi-faceted exploration of the influences of individual modalities and their interplay on model predictions for multimodal sentiment analysis. Besides understanding human communication behavior, in the third work, we present VoiceCoach, a visual analytics system that can evaluate speakers’ voice modulation skills regarding volume, pitch, speed, and pause, and recommend good learning examples of voice modulation in TED Talks to follow. Moreover, during the practice, the system can provide immediate visual feedback to speakers for self-awareness and performance improvement. Finally, we introduce an ongoing work that focuses on interactive story-based vocabulary learning powered by language models. We aim to build an interactive visual analytics system that integrates three story-based learning strategies for students to learn users’ specified English words, including reading a machine-generated story, completing a story cloze test, and taking turns with the machine to co-write a story using all the target words. Date: Monday, 30 May 2022 Time: 2:00pm - 4:00pm Zoom Meeting: https://hkust.zoom.us/j/4210096111 Committee Members: Prof. Huamin Qu (Supervisor) Prof. Nevin Zhang (Chairperson) Prof. Qiong Luo Dr. Xiaojuan Ma **** ALL are Welcome ****