LEARNING TO SYNTHESIZE REALISTIC VIDEOS

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "LEARNING TO SYNTHESIZE REALISTIC VIDEOS"

By

Miss Yue WU


Abstract:

The generation of videos is a fundamental problem in the field of computer
vision. The ability to capture, understand and reproduce the dynamics of our
visual world is very important. Apart from the research interests, video
synthesis has an expansive range of fields such as computer vision, computer
graphics, and robotics. Video Synthesis has two main research directions:
conditional video synthesis and unconditional video synthesis.

For conditional video synthesis, videos are synthesized conditioned on previous
frames, semantic segmentation, edges, and texts. Among all these conditions, we
focus on the task of conditioning on previous frames. Video synthesis
conditioning on previous frames requires a comprehensive knowledge of the scope
of the video and the motion and interactions between objects. Previous methods
fail to produce videos with accurate motions and long-term videos.

Our first work addressed the motion understanding problem in the video
synthesis task. We propose FVS. Instead of understanding the whole scene as one
component, our framework is designed to decompose a scene into a background
that includes the regions does not contains self-motion and moving objects. The
appearance change of static background is because of camera movement, and that
of moving objects consists of ego-motion and camera movement. Decomposing the
understanding of the content increases the accuracy of video synthesis.

Although in the first work, we explore how to improve the video synthesis
quality based on past frames. Decomposing the scene into different components
requires the requirement of semantic segmentation, which is a strong assumption
and restricts the application of video synthesis algorithms. Thus, we introduce
an optimization framework utilizing a video frame interpolation pre-trained
model. In this work, we do not require additional assumptions about the scene
and do not require external training, which makes the video synthesis framework
applicable to all the scenes and obtain high quality.

Besides conditional video synthesis, we further explore unconditional video
synthesis, which represents generating novel, non-existing videos without any
specific input. Concretely, we study the 3D-aware generative models for video
avatar generation and animation tasks. We propose AniFaceGAN, an animatable
3D-aware face generation method which is used to synthesize highly realistic
face images with controllable pose and expression sequences. We introduce a 3D
parametric face model as priors and use a 3D deformation field to leverage
desired expression change. Also, we introduce a set of 3D losses to enforce our
deformation field to imitate it under expression variations. Our method can
generate realistic face videos with high visual quality.


Date: 			Monday, 12 June 2023

Time: 			10:00am - 12:00noon

Venue: 			Room 3494
			lifts 25/26

Chairperson: 		Prof. Xiangtong QI (IEDA)

Committee Members: 	Prof. Qifeng CHEN (Supervisor)
			Prof. Pedro SANDER
			Prof. Dan XU
			Prof. Ling SHI (ECE)
			Prof. Ping LUO (HKU)


**** ALL are Welcome ****