Modeling the Physical World: Generative World Models for Robot Learning

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Modeling the Physical World: Generative World Models for Robot 
Learning"

By

Mr. Siyuan ZHOU


Abstract:

The evolution of robotics is currently transitioning from programmed 
precision in structured environments to General-Purpose Robotic Intelligence 
capable of interacting the unpredictability of the real world. While the 
advent of Vision-Language-Action (VLA) models has endowed robots with 
unprecedented semantic common sense through internet-scale pre-training, a 
significant "generalization gap" remains. This thesis investigates the 
potential of generative video models to function as implicit world models, 
serving as high-fidelity physics simulators that learn the laws of the world 
directly from visual data. By treating video generation as an engine for 
visual imagination, we provide robots with a mental sandbox to predict the 
consequences of their actions. However, deploying these models for robotic 
control requires overcoming critical hurdles, including physical 
hallucinations, high inference latency, and the maintenance of spatiotemporal 
consistency.

To address these challenges, this research first analyzes the computational 
bottlenecks of diffusion-based trajectory optimization and proposes a novel 
method to enhance inference efficiency, thereby facilitating real-time 
planning in dynamic settings. Furthermore, a compositional framework is 
introduced to mitigate the data scarcity problem and bridge the 
generalization gap by moving beyond simple behavioral cloning. Finally, the 
thesis presents an architecture that integrates a dedicated memory module 
into the generative pipeline to ensure object permanence and state 
consistency over long horizons. Collectively, these advancements provide a 
robust pathway toward developing physically grounded, self-correcting world 
models for autonomous agents in open-world environments.


Date:                   Monday, 8 June 2026

Time:                   10:00am - 12:00noon

Venue:                  Room 3494
                        Lifts 25/26

Chairman:               Prof. Andrew Glen COHEN (PHYS)

Committee Members:      Prof. Dit-Yan YEUNG (Supervisor)
                        Dr. Qifeng CHEN
                        Prof. Pedro SANDER
                        Dr. Fangneng ZHAN (AMC)
                        Prof. Yizhou WANG (Peking University)