Modeling the Physical World: Generative World Models for Robot Learning

PhD Thesis Proposal Defence


Title: "Modeling the Physical World: Generative World Models for Robot 
Learning"

by

Mr. Siyuan ZHOU


Abstract:

The evolution of robotics is currently transitioning from programmed 
precision in structured environments to General-Purpose Robotic Intelligence 
capable of interacting the unpredictability of the real world. While the 
advent of Vision-Language-Action (VLA) models has endowed robots with 
unprecedented semantic common sense through internet-scale pre-training, a 
significant "generalization gap" remains.

This thesis investigates the potential of generative video models to function 
as implicit world models, serving as high-fidelity physics simulators that 
learn the laws of the world directly from visual data. By treating video 
generation as an engine for visual imagination, we provide robots with a 
mental sandbox to predict the consequences of their actions. However, 
deploying these models for robotic control requires overcoming critical 
hurdles, including physical hallucinations, high inference latency, and the 
maintenance of spatiotemporal consistency.

To address these challenges, this research first analyzes the computational 
bottlenecks of diffusion-based trajectory optimization and proposes a novel 
method to enhance inference efficiency, thereby facilitating real-time 
planning in dynamic settings. Furthermore, a compositional framework is 
introduced to mitigate the data scarcity problem and bridge the 
generalization gap by moving beyond simple behavioral cloning. Finally, the 
thesis presents an architecture that integrates a dedicated memory module 
into the generative pipeline to ensure object permanence and state 
consistency over long horizons. Collectively, these advancements provide a 
robust pathway toward developing physically grounded, self-correcting world 
models for autonomous agents in open-world environments.


Date:                   Thursday, 9 April 2026

Time:                   2:00pm - 4:00pm

Venue:                  Room 2132C
                        Lift 22

Committee Members:      Prof. Dit-Yan Yeung (Supervisor)
                        Prof. Fangzhen Lin (Chairperson)
                        Prof. Nevin Zhang