More about HKUST
Talking Head Video Diffusion Generation
The Hong Kong University of Science and Technology Department of Computer Science and Engineering Final Year Thesis Oral Defense Title: "Talking Head Video Diffusion Generation" by WANG Yucheng Abstract: Generating natural and expressive talking head video has proven to be a challenging task, which involves various heterogeneous and multi-source conditions. While GAN-based methods have achieved notable progress, certain difficulties still remain unaddressed. These challenges include precise alignment of lip movements with the audio signal, identity preservation of the original image, and temporal consistency of the generated video. Thus, we present LFDHead, a method that utilizes two pipelines -- one at the spatial level and the other at the temporal level -- to improve the consistency in the generated results. In the Multimodal-to-Latent Diffusion Pipeline, we incorporate the encoded original image as a condition for each diffusion step, which enhances spatial consistency and preserves identity. In the Latent-Fusion Rendering Pipeline, we use the latent fusion techniques to swiftly obtain temporal latent information from neighbors in the temporal sequence and generate continuous frames. We have conducted extensive experiments to demonstrate the generation consistency of our proposed method. Date : 29 April 2024 (Monday) Time : 15:00 - 15:40 Venue : Room 5501 (near lifts 25/26), HKUST Advisor : Dr. XU Dan 2nd Reader : Dr. CHEN Long