Towards Controllable Video Generation with Diffusion Models

PhD Qualifying Examination


Title: "Towards Controllable Video Generation with Diffusion Models"

by

Mr. Yihao MENG


Abstract:

Controllable video generation has gained broad importance in computer vision 
due to its potential in creative content production. While text-driven 
approaches offer a convenient interface, they often grapple with identity 
consistency, fine-grained motion control, and nuanced camera viewpoints that 
limit their applicability in many real-world scenarios. This challenge 
underscores the need for more robust methods capable of synthesizing 
realistic, temporally coherent videos with explicit control. Recently, 
diffusion-based generative modeling has emerged as a powerful framework, 
demonstrating considerable success in producing high-fidelity images and 
videos. In this survey, we investigate how these models can facilitate 
controllable video synthesis, highlighting the limitations of text-only 
guidance and underscoring the role of additional control signals such as 
semantic masks, pose keypoints, and camera parameters. We propose a taxonomy 
that encompasses three core dimensions of controllability—appearance, 
motion, and camera—and review current methods through this lens. By 
synthesizing key findings and challenges, this work aims to guide future 
research toward developing flexible, robust, and high-quality 
diffusion-based solutions for controllable video generation.


Date:                   Monday, 26 May 2025

Time:                   10:00am - 12:00noon

Venue:                  Room 2128A
                        Lift 19

Committee Members:      Prof. Huamin Qu (Supervisor)
                        Dr. Qifeng Chen (Chairperson)
                        Prof. Pedro Sander
                        Dr. Anyi Rao (AMC)