More about HKUST
Modeling the Physical World: Generative World Models for Robot Learning
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
PhD Thesis Defence
Title: "Modeling the Physical World: Generative World Models for Robot
Learning"
By
Mr. Siyuan ZHOU
Abstract:
The evolution of robotics is currently transitioning from programmed
precision in structured environments to General-Purpose Robotic Intelligence
capable of interacting the unpredictability of the real world. While the
advent of Vision-Language-Action (VLA) models has endowed robots with
unprecedented semantic common sense through internet-scale pre-training, a
significant "generalization gap" remains. This thesis investigates the
potential of generative video models to function as implicit world models,
serving as high-fidelity physics simulators that learn the laws of the world
directly from visual data. By treating video generation as an engine for
visual imagination, we provide robots with a mental sandbox to predict the
consequences of their actions. However, deploying these models for robotic
control requires overcoming critical hurdles, including physical
hallucinations, high inference latency, and the maintenance of spatiotemporal
consistency.
To address these challenges, this research first analyzes the computational
bottlenecks of diffusion-based trajectory optimization and proposes a novel
method to enhance inference efficiency, thereby facilitating real-time
planning in dynamic settings. Furthermore, a compositional framework is
introduced to mitigate the data scarcity problem and bridge the
generalization gap by moving beyond simple behavioral cloning. Finally, the
thesis presents an architecture that integrates a dedicated memory module
into the generative pipeline to ensure object permanence and state
consistency over long horizons. Collectively, these advancements provide a
robust pathway toward developing physically grounded, self-correcting world
models for autonomous agents in open-world environments.
Date: Monday, 8 June 2026
Time: 10:00am - 12:00noon
Venue: Room 3494
Lifts 25/26
Chairman: Prof. Andrew Glen COHEN (PHYS)
Committee Members: Prof. Dit-Yan YEUNG (Supervisor)
Dr. Qifeng CHEN
Prof. Pedro SANDER
Dr. Fangneng ZHAN (AMC)
Prof. Yizhou WANG (Peking University)