Learning to synthesize realistic videos

PhD Thesis Proposal Defence


Title: "Learning to synthesize realistic videos"

by

Miss Yue WU


Abstract:

Video synthesis is a fundamental task in computer vision. The ability to 
capture, understand and reproduce the dynamics of our visual world is very 
important. Apart from the research interests, video synthesis has a wide 
range of applications in computer vision, robotics, and computer graphics. 
Video Synthesis has two main research directions: conditional video 
synthesis and unconditional video synthesis. For conditional video 
synthesis, videos are synthesized conditioned on previous frames, semantic 
segmentation, edges, and texts. Among all these conditions, we focus on 
the task of conditioning on previous frames. Video synthesis conditioning 
on previous frames requires a comprehensive understanding of the content 
of the video, and the motion and interactions between objects. Previous 
methods fail to produce videos with accurate motions and long-term videos.

Our first work addressed the motion understanding problem in the video 
synthesis task. We propose FVS. Instead of understanding the whole scene 
as one component, our framework is designed to decompose a scene into a 
static background and moving objects. The appearance change of static 
background is because of camera movement, and that of moving objects 
consists of ego-motion and camera movement. Decomposing the understanding 
of the content increases the accuracy of video synthesis.

Although in the first work, we explore how to improve the video synthesis 
quality based on past frames. Decomposing the scene into different 
components requires the requirement of semantic segmentation, which is a 
strong assumption and restricts the application of video synthesis 
algorithms. Thus, we introduce an optimization framework utilizing a video 
frame interpolation pre-trained model. In this work, we do not require 
additional assumptions about the scene and do not require external 
training, which makes the video synthesis framework applicable to all the 
scenes and obtain high quality

Besides conditional video synthesis, we further explore unconditional 
video synthesis, which represents generating novel, non-existing videos 
without any specific input. Concretely, we study the 3D-aware generative 
models for video avatar generation and animation tasks. We propose 
AniFaceGAN, an animatable 3D-aware face generation method which is used to 
synthesize highly realistic face images with controllable pose and 
expression sequences. We introduce a 3D parametric face model as priors 
and use a 3D deformation field to leverage desired expression change. 
Also, we introduce a set of 3D losses to enforce our deformation field to 
imitate it under expression variations. Our method can generate realistic 
face videos with high visual quality.


Date:			Monday, 27 March 2023

Time:                  	4:00pm - 6:00pm

Venue:			Room 4475
  			lifts 25/26

Committee Members:	Dr. Qifeng Chen (Supervisor)
 			Prof. Dit-Yan Yeung (Chairperson)
 			Prof. Pedro Sander
 			Dr. Dan Xu


**** ALL are Welcome ****