More about HKUST
Learning to synthesize realistic videos
PhD Thesis Proposal Defence Title: "Learning to synthesize realistic videos" by Miss Yue WU Abstract: Video synthesis is a fundamental task in computer vision. The ability to capture, understand and reproduce the dynamics of our visual world is very important. Apart from the research interests, video synthesis has a wide range of applications in computer vision, robotics, and computer graphics. Video Synthesis has two main research directions: conditional video synthesis and unconditional video synthesis. For conditional video synthesis, videos are synthesized conditioned on previous frames, semantic segmentation, edges, and texts. Among all these conditions, we focus on the task of conditioning on previous frames. Video synthesis conditioning on previous frames requires a comprehensive understanding of the content of the video, and the motion and interactions between objects. Previous methods fail to produce videos with accurate motions and long-term videos. Our first work addressed the motion understanding problem in the video synthesis task. We propose FVS. Instead of understanding the whole scene as one component, our framework is designed to decompose a scene into a static background and moving objects. The appearance change of static background is because of camera movement, and that of moving objects consists of ego-motion and camera movement. Decomposing the understanding of the content increases the accuracy of video synthesis. Although in the first work, we explore how to improve the video synthesis quality based on past frames. Decomposing the scene into different components requires the requirement of semantic segmentation, which is a strong assumption and restricts the application of video synthesis algorithms. Thus, we introduce an optimization framework utilizing a video frame interpolation pre-trained model. In this work, we do not require additional assumptions about the scene and do not require external training, which makes the video synthesis framework applicable to all the scenes and obtain high quality Besides conditional video synthesis, we further explore unconditional video synthesis, which represents generating novel, non-existing videos without any specific input. Concretely, we study the 3D-aware generative models for video avatar generation and animation tasks. We propose AniFaceGAN, an animatable 3D-aware face generation method which is used to synthesize highly realistic face images with controllable pose and expression sequences. We introduce a 3D parametric face model as priors and use a 3D deformation field to leverage desired expression change. Also, we introduce a set of 3D losses to enforce our deformation field to imitate it under expression variations. Our method can generate realistic face videos with high visual quality. Date: Monday, 27 March 2023 Time: 4:00pm - 6:00pm Venue: Room 4475 lifts 25/26 Committee Members: Dr. Qifeng Chen (Supervisor) Prof. Dit-Yan Yeung (Chairperson) Prof. Pedro Sander Dr. Dan Xu **** ALL are Welcome ****