Generative World Models for Robot Learning: A Survey

PhD Qualifying Examination


Title: "Generative World Models for Robot Learning: A Survey"

by

Mr. Fangqi ZHU


Abstract:

Robotics is moving from task-specific controllers toward general-purpose 
agents that can operate in open, dynamic environments. Vision-language-action 
models have improved semantic understanding and action generation, but many 
remain largely reactive: they select actions from observations and 
instructions without explicitly predicting how those actions will change 
objects, contacts, and future task states. This gap motivates generative 
world models, which learn predictive representations of physical dynamics and 
use them to imagine future observations, evaluate candidate behaviors, 
synthesize robot data, support planning, and improve policies before costly 
real-world execution. This survey reviews generative world models for robot 
learning through two complementary roles. As external simulators, they 
provide counterfactual rollouts for data generation, policy evaluation, model 
predictive control, and imagined reinforcement learning. As internal 
predictive modules, they embed foresight inside robot policies or 
vision-language-action models, supporting inverse dynamics, action 
generation, and embodied reasoning. Across latent world models, video 
foundation models, action- conditioned robot simulators, and 
world-model-based policy optimization, the survey focuses on representation, 
action grounding, decision coupling, and reliability, arguing that useful 
robot world models must move beyond visually plausible generation toward 
action-faithful, physically grounded, uncertainty-aware, and efficient 
prediction.


Date:                   Wednesday, 27 May 2026

Time:                   11:30am - 1:00pm

Venue:                  Room 2128B
                        Lift 19

Committee Members:      Prof. Song Guo (Supervisor)
                        Dr. Dan Xu (Chairperson)
                        Dr. Long Chen