Visual Analysis of Human Behaviors in Classroom and Public Speech Videos

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Visual Analysis of Human Behaviors in Classroom and Public Speech 
Videos"

By

Mr. Haipeng ZENG


Abstract

Analyzing human behaviors in videos has great value for various 
applications, such as education, communication, sports, and surveillance. 
For example, analyzing students' engagement in classroom videos can help 
teachers improve their teaching and analyzing speakers' presentation 
skills in public speech videos can better facilitate presentation skills 
training. However, it is very time-consuming to manually digest and 
analyze human behaviors in videos, especially when users need to conduct 
detailed analysis, such as dynamic behavior comparison and behavior 
evolution exploration. Therefore, recent research has proposed automated 
video analysis techniques to facilitate this process, such as face 
detection, emotion recognition, pose estimation and action recognition. 
Although they have demonstrated promising performances in extracting human 
behaviors, in the real world they are insufficient to support detailed 
analysis with various analytical tasks. To this end, visual analytics has 
been applied to effectively analyze huge information spaces, support data 
exploration and facilitate decision-making, which sheds light on helping 
users interactively explore and analyze video data.

In this thesis, we propose three novel interactive visual analytics 
systems that combine automated video analysis techniques with 
human-centered visualizations to help users explore and analyze video 
data. In our first work, we propose EmotionCues, a visual analytics system 
that integrates emotion recognition algorithms with visualizations to 
easily analyze classroom videos from the perspective of emotion summary 
and detailed analysis. In particular, the system supports the visual 
analysis of classroom videos on two different levels of granularity, 
namely, the overall emotion evolution patterns of all the people involved, 
and the detailed visualization of an individual's emotions. In the second 
work, considering the multi-modality of video data, we propose EmoCo, an 
interactive visual analytics system to facilitate the fine-grained 
analysis of emotion coherence across face, text, and audio modalities in 
presentation videos. By developing suitable interactive visualizations 
enhanced with new features, the system allows users to conduct the 
in-depth exploration of emotions on three levels of detail (i.e., video, 
sentence, and word level). In the third work, we focus on visualizing hand 
movement in videos and propose GestureLens, a visual analytics system to 
help users explore and analyze gesture usage in presentation videos. It 
enables users to gain a quick spatial and temporal overview of gestures, 
as well as to conduct both content-based and gesture-based explorations. 
Both real-world case studies and feedback from the collaboration domain 
experts verify the effectiveness and usefulness of all the proposed 
systems.


Date:			Tuesday, 4 August 2020

Time:			10:00am - 12:00noon

Zoom Meeting:		https://hkust.zoom.us/j/97648667651

Chairman:		Prof. Chi Ying TSUI (ISD)

Committee Members:	Prof. Ting Chuen PONG (Supervisor)
 			Prof. Huamin QU (Supervisor)
 			Prof. Xiaojuan MA
 			Prof. Pedro SANDER
 			Prof. Richard SO (IEDA)
 			Prof. Shengdong ZHAO (National Univ of Singapore)


**** ALL are Welcome ****