More about HKUST
Modeling the Physical World: Generative World Models for Robot Learning
PhD Thesis Proposal Defence
Title: "Modeling the Physical World: Generative World Models for Robot
Learning"
by
Mr. Siyuan ZHOU
Abstract:
The evolution of robotics is currently transitioning from programmed
precision in structured environments to General-Purpose Robotic Intelligence
capable of interacting the unpredictability of the real world. While the
advent of Vision-Language-Action (VLA) models has endowed robots with
unprecedented semantic common sense through internet-scale pre-training, a
significant "generalization gap" remains.
This thesis investigates the potential of generative video models to function
as implicit world models, serving as high-fidelity physics simulators that
learn the laws of the world directly from visual data. By treating video
generation as an engine for visual imagination, we provide robots with a
mental sandbox to predict the consequences of their actions. However,
deploying these models for robotic control requires overcoming critical
hurdles, including physical hallucinations, high inference latency, and the
maintenance of spatiotemporal consistency.
To address these challenges, this research first analyzes the computational
bottlenecks of diffusion-based trajectory optimization and proposes a novel
method to enhance inference efficiency, thereby facilitating real-time
planning in dynamic settings. Furthermore, a compositional framework is
introduced to mitigate the data scarcity problem and bridge the
generalization gap by moving beyond simple behavioral cloning. Finally, the
thesis presents an architecture that integrates a dedicated memory module
into the generative pipeline to ensure object permanence and state
consistency over long horizons. Collectively, these advancements provide a
robust pathway toward developing physically grounded, self-correcting world
models for autonomous agents in open-world environments.
Date: Thursday, 9 April 2026
Time: 2:00pm - 4:00pm
Venue: Room 2132C
Lift 22
Committee Members: Prof. Dit-Yan Yeung (Supervisor)
Prof. Fangzhen Lin (Chairperson)
Prof. Nevin Zhang