From Virtual to Physical: Evolving Visual Generation Agents across Simulated, Real, and Physics-Aware Environments

PhD Qualifying Examination


Title: "From Virtual to Physical: Evolving Visual Generation Agents across 
Simulated, Real, and Physics-Aware Environments"

by

Mr. Jinxiang LAI


Abstract:

The field of Visual Generation Agents (VGAs) is undergoing a rapid evolution 
from single-shot generation toward multi-step interactive workflows. Despite 
this momentum, the area still lacks a systematic survey that clarifies its 
core challenges, training paradigms, and future directions. Through a 
problem-driven analysis, this paper first identifies three properties that 
fundamentally distinguish VGAs from conventional Large Language Model (LLM) 
agents: (1) tools are stochastic generation systems, which conflates training 
signals and obscures responsibility attribution; (2) rewards are subjective, 
multi-dimensional, and creative, which introduces evaluation uncertainty; and 
(3) outputs must obey physical laws, which imposes strong generative 
constraints. These visual-specific properties give rise to six core challenges 
that VGAs face during training and deployment, which we systematically review 
together with their representative solutions. Building on this analysis, we 
observe that VGA training paradigms naturally diverge into three complementary 
environments, namely Virtual, Real, and Physics-Aware, which respectively 
address training affordability, training correctness, and training fidelity. 
Accordingly, we propose a Virtual—Real— Physics-Aware co-training framework, 
in which three sequential stages jointly produce agents capable of 
orchestrating diverse tools to generate content that is both creative and 
physically faithful. Finally, we outline future directions on token-level 
reward optimization, personalized agents, and 3D-native physics-faithful 
generation. By identifying the core challenges and training paradigms of VGAs 
and outlining a co-training framework that traces the evolution from virtual 
simulation to real physical environments, this survey aims to provide a clear 
roadmap and theoretical foundation for future research.


Date:                   Wednesday, 10 June 2026

Time:                   2:00pm - 4:00pm

Venue:                  Room 3494
                        Lift 25/26

Committee Members:      Prof. Song Guo (Supervisor)
                        Dr. Dan Xu (Chairperson)
                        Dr. Zihan Zhang