More about HKUST
From Generation to Interaction: A Survey of Interactive Video World Models
PhD Qualifying Examination
Title: "From Generation to Interaction: A Survey of Interactive Video World
Models"
by
Mr. Bohai GU
Abstract:
The rapid progress of video foundation models and embodied AI has accelerated
a major transition in world modeling, from latent-state models designed for
internal planning toward pixel-space models that directly generate visible
interactive worlds. This survey analyzes this transition through a
problem-oriented framework rather than a flat taxonomy of architectures. We
organize the current field around three tightly coupled subproblems:
generation, which concerns the shift from offline clip synthesis to causal
rollout for interactive systems; action diversity, which examines how
camera-based, instruction- based, and other action signals can be
incorporated into video world models to induce meaningful world changes; and
context construction, which addresses how world state can be represented,
refreshed, and maintained across long-horizon interaction. Beyond these core
subproblems, we discuss physics-aware interaction, off-camera state
prediction, and self-evolution as promising future directions. Overall, this
survey aims to clarify how interactive video world models can evolve from
visually plausible generators into causal, actionable, and stateful world
simulators.
Date: Wednesday, 27 May 2026
Time: 10:00am - 12:00noon
Venue: Room 2128A
Lift 19
Committee Members: Prof. Song Guo (Supervisor)
Dr. Dan Xu (Chairperson)
Dr. Long Chen