More about HKUST
Consistent and Controllable Visual Generation in Diffusion Era
PhD Qualifying Examination
Title: "Consistent and Controllable Visual Generation in Diffusion Era"
by
Mr. Qingyan BAI
Abstract:
The rapid ascendancy of diffusion models has redefined the landscape of
visual generation, achieving unprecedented fidelity in synthesizing images,
videos, and 3D content. However, the initial success of large-scale text-to-
image models has also highlighted their fundamental limitations in coarse-
grained control, which relies merely on ambiguous text prompts. Consequently,
the research frontier has aggressively shifted from high-fidelity generation
to high-precision control. This survey charts the evolution of this paradigm
shift, focusing on two intertwined challenges: controllability (the ability
to guide synthesis with fine-grained inputs like pose, edges, or layouts) and
consistency (the ability to maintain the fidelity of specific visual
attributes, such as identity, style, or temporal coherence, often guided by
reference images). We organize the review by modality, analyzing key works in
image, video, and 3D generation. We find a recurring central theme: the
inherent tension between fidelity to the control signal such as identity and
editability from creative guidance. We analyze prior arts to navigate this
trade-off through novel architectural designs, training strategies, and
control-signal decoupling. Finally, we discuss open challenges, including
compositionality, long-range temporal coherence, and the scaling versus
architecture debate, to provide a forward-looking perspective on this vibrant
field.
Date: Monday, 1 December 2025
Time: 10:00am - 12:00pm
Venue: Room 4472
Lift 25/26
Committee Members: Dr. Qifeng Chen (Supervisor)
Prof. Pedro Sander (Chairperson)
Dr. Wenhan Luo (AMC)