More about HKUST
ANALYSIS AND AUGMENTATION OF HUMAN ACTIONS IN VIDEOS
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "ANALYSIS AND AUGMENTATION OF HUMAN ACTIONS IN VIDEOS" By Miss Jingyuan LIU Abstract Analyzing human actions in videos and augmenting human action videos with visual effects are common tasks in video understanding and editing. However, they are challenging in three aspects. First, analyzing human actions in videos automatically and augmenting videos with visual effects require programming or professional tools, and are thus often tedious and not friendly to novice users. Second, the intrinsic perspective shortening problem in videos makes both observation and computation of human action attributes affected by the viewpoint. Third, the data attributes of human actions in question are often application-specific, and thus are either pre-defined or require programming to generalize to new instances, limiting the capability of supporting customized analysis, especially for novices. This thesis aims to address the above limitations in both the analysis and augmentation of human action videos. We first present a tool PoseTween that allows users to easily add visual effects (animated virtual objects) to augment human action videos. We propose to model the visual effects as tween animations of virtual objects driven by the subject’s movements in videos. By utilizing the subject’s movements, PoseTween achieves natural interactions between the augmented virtual objects and the subjects in videos, while largely simplifying the editing process. We then study the problem of finding the temporal alignments of human action videos, which is useful for the automatic transfer xi of visual effects from a template video to a target video based on action proximity to reduce user intervention. To address the perspective shortening problem, we propose a deep learning-based method that normalizes the human poses in videos and extracts features from the normalized poses for matching. The temporal alignment by matching two human action videos with the normalized human pose features is thus invariant to variations in videos, such as camera viewpoint and subject anthropometry. In the third part of the thesis we study the analysis and visualization of differences in local human poses. We design a tool, PoseCoach, for video-based running coaching by comparing the running poses between an amateur runner and a professional runner. Our tool allows the interactive annotation of human pose biomechanical attributes, such that novice users (e.g., amateur runners) can perform customizable analysis from human action videos without explicit programming. Existing visualization methods that show the differences in local human poses with side-by-side or overlaid placements are subject to viewpoint variation and require users’ perception to interpret the differences. We thus also propose a visualization method to intuitively show the pose differences by 3D animations of a body model. We conduct extensive quantitative evaluations and user studies to evaluate the effectiveness of our proposed methods. The results show that our tools are friendly to novice users in both the analysis of actions in videos and the augmentation of human action videos with animated virtual objects. The normalized pose features show promising accuracies in various tasks that require measuring pose similarity, such as video temporal alignment and action recognition. Date: Friday, 22 July 2022 Time: 10:00am - 12:00noon Zoom Meeting: https://hkust.zoom.us/j/9759430635 Chairperson: Prof. Zhigang LI (MAE) Committee Members: Prof. Chiew Lan TAI (Supervisor) Prof. Xiaojuan MA Prof. Pedro SANDER Prof. Ajay JONEJA (ISD) Prof. Pheng Ann HENG (CUHK) **** ALL are Welcome ****