Consistent and Controllable Visual Generation in Diffusion Era

PhD Qualifying Examination


Title: "Consistent and Controllable Visual Generation in Diffusion Era"

by

Mr. Qingyan BAI


Abstract:

The rapid ascendancy of diffusion models has redefined the landscape of 
visual generation, achieving unprecedented fidelity in synthesizing images, 
videos, and 3D content. However, the initial success of large-scale text-to-
image models has also highlighted their fundamental limitations in coarse-
grained control, which relies merely on ambiguous text prompts. Consequently, 
the research frontier has aggressively shifted from high-fidelity generation 
to high-precision control. This survey charts the evolution of this paradigm 
shift, focusing on two intertwined challenges: controllability (the ability 
to guide synthesis with fine-grained inputs like pose, edges, or layouts) and 
consistency (the ability to maintain the fidelity of specific visual 
attributes, such as identity, style, or temporal coherence, often guided by 
reference images). We organize the review by modality, analyzing key works in 
image, video, and 3D generation. We find a recurring central theme: the 
inherent tension between fidelity to the control signal such as identity and 
editability from creative guidance. We analyze prior arts to navigate this 
trade-off through novel architectural designs, training strategies, and 
control-signal decoupling. Finally, we discuss open challenges, including 
compositionality, long-range temporal coherence, and the scaling versus 
architecture debate, to provide a forward-looking perspective on this vibrant 
field.


Date:                   Monday, 1 December 2025

Time:                   10:00am - 12:00pm

Venue:                  Room 4472
                        Lift 25/26

Committee Members:      Dr. Qifeng Chen (Supervisor)
                        Prof. Pedro Sander (Chairperson)
                        Dr. Wenhan Luo (AMC)