Re-creating 3D World from 2D Captures with 3D Generative AI

Speaker: Dr. Yinghao XU
Stanford University

Title: "Re-creating 3D World from 2D Captures with 3D Generative AI"

Date: Wednesday; 8 January 2025

Time: 4:00pm - 5:00pm

Venue: Room 4502 (via lift 25/26), HKUST

Abstract:

Generative Artificial Intelligence (AI), with its capacity to synthesize, 
simulate, and predict data, has made significant strides in understanding 
and generating one-dimensional natural language and two-dimensional images 
and videos. However, human intelligence is fundamentally grounded in the 3D 
world. While we capture 2D visual information with our eyes, our brain 
transforms it into rich, 3D mental representations, enabling us to perceive, 
understand, and interact with our 3D environment. This makes the development 
of 3D generative AI a crucial step toward achieving more human-like 
artificial intelligence.

In this talk, I will present my work on developing 3D generative AI systems 
to generate three-dimensional content from 2D observations. These systems 
demonstrate robust capabilities in perceiving and generating rich 3D 
representations of environments and leveraging them to understand, interact 
with, and reason about the 3D world. I will begin with differentiable scene 
representation, discussing how we learn generic scene priors from multi-view 
imagery using the inverse graphics pipeline. These representations and scene 
priors have inspired me to explore novel 3D generative paradigms as 
alternatives to conventional graphics workflows, enabling us to build 3D 
generative models from 2D images without direct supervision from 3D data. 
Finally, I will discuss how to leverage domain-specific knowledge from 3D 
modeling to facilitate 3D control over generated videos, advancing into 4D 
content generation. Through this research, I aim to pave the way for more 
advanced spatial reasoning and decision-making in AI agents, bringing us 
closer to achieving human-like intelligence in 3D understanding and 
interaction.


***************
Biography:

Yinghao Xu is a postdoctoral researcher at the Stanford Computational 
Imaging Lab, Stanford University, advised by Prof. Gordon Wetzstein. 
Previously, he was a Ph.D. student at The Chinese University of Hong Kong. 
He has a deep interest in the intersection of Computer Graphics and Computer 
Vision. His current research focuses on generative models and neural 
rendering, particularly in the area of 3D generative models. One of his 
papers was nominated as a best paper candidate at CVPR 2020. He was also 
awarded the WAIC Rising Star 2024 and was nominated for the Snap Fellowship 
2022.