ANALYSIS AND AUGMENTATION OF HUMAN ACTIONS IN VIDEOS

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "ANALYSIS AND AUGMENTATION OF HUMAN ACTIONS IN VIDEOS"

By

Miss Jingyuan LIU


Abstract

Analyzing human actions in videos and augmenting human action videos with 
visual effects are common tasks in video understanding and editing. 
However, they are challenging in three aspects. First, analyzing human 
actions in videos automatically and augmenting videos with visual effects 
require programming or professional tools, and are thus often tedious and 
not friendly to novice users. Second, the intrinsic perspective shortening 
problem in videos makes both observation and computation of human action 
attributes affected by the viewpoint. Third, the data attributes of human 
actions in question are often application-specific, and thus are either 
pre-defined or require programming to generalize to new instances, 
limiting the capability of supporting customized analysis, especially for 
novices. This thesis aims to address the above limitations in both the 
analysis and augmentation of human action videos.

We first present a tool PoseTween that allows users to easily add visual 
effects (animated virtual objects) to augment human action videos. We 
propose to model the visual effects as tween animations of virtual objects 
driven by the subject’s movements in videos. By utilizing the subject’s 
movements, PoseTween achieves natural interactions between the augmented 
virtual objects and the subjects in videos, while largely simplifying the 
editing process. We then study the problem of finding the temporal 
alignments of human action videos, which is useful for the automatic 
transfer xi of visual effects from a template video to a target video 
based on action proximity to reduce user intervention. To address the 
perspective shortening problem, we propose a deep learning-based method 
that normalizes the human poses in videos and extracts features from the 
normalized poses for matching. The temporal alignment by matching two 
human action videos with the normalized human pose features is thus 
invariant to variations in videos, such as camera viewpoint and subject 
anthropometry. In the third part of the thesis we study the analysis and 
visualization of differences in local human poses. We design a tool, 
PoseCoach, for video-based running coaching by comparing the running poses 
between an amateur runner and a professional runner. Our tool allows the 
interactive annotation of human pose biomechanical attributes, such that 
novice users (e.g., amateur runners) can perform customizable analysis 
from human action videos without explicit programming. Existing 
visualization methods that show the differences in local human poses with 
side-by-side or overlaid placements are subject to viewpoint variation and 
require users’ perception to interpret the differences. We thus also 
propose a visualization method to intuitively show the pose differences by 
3D animations of a body model.

We conduct extensive quantitative evaluations and user studies to evaluate 
the effectiveness of our proposed methods. The results show that our tools 
are friendly to novice users in both the analysis of actions in videos and 
the augmentation of human action videos with animated virtual objects. The 
normalized pose features show promising accuracies in various tasks that 
require measuring pose similarity, such as video temporal alignment and 
action recognition.


Date:			Friday, 22 July 2022

Time:			10:00am - 12:00noon

Zoom Meeting: 		https://hkust.zoom.us/j/9759430635

Chairperson:		Prof. Zhigang LI (MAE)

Committee Members:	Prof. Chiew Lan TAI (Supervisor)
 			Prof. Xiaojuan MA
 			Prof. Pedro SANDER
 			Prof. Ajay JONEJA (ISD)
 			Prof. Pheng Ann HENG (CUHK)


**** ALL are Welcome ****