More about HKUST
High-Quality Visual Content Creation with Foundation Generative Models
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
PhD Thesis Defence
Title: "High-Quality Visual Content Creation with Foundation Generative Models"
By
Mr. Tengfei WANG
Abstract:
The increasing demand for high-quality visual content, encompassing 2D images
and 3D models, is evident across various applications, such as virtual reality
and video games. However, creating such visual content can be laborious as it
requires a combination of artistic expertise and proficiency in 2D painting or
3D modeling pipelines. Recently, a plethora of deep generative models has
enabled the creation of visual content on an unprecedented scale and at
remarkable speed. Nonetheless, to achieve the generation under different
control conditions, e.g., editing attribute, sketch, and text prompt,
acquiring corresponding large-scale training data poses a significant
challenge due to concerns on copyright, privacy, and collection costs. The
limited availability of data and computing resources can thus hamper the
quality of generated content.
Inspired by the tremendous success of model pretraining in visual
understanding and natural language processing, this thesis aims at exploring a
new generative paradigm that leverages well-trained foundation generative
models to boost visual content creation including both 2D image synthesis and
3D model rendering. We begin this thesis with high-fidelity face image
editing, where we embed real images to the latent space of well-trained
generative adversarial networks (GAN). Our GAN inversion framework allows for
various attribute editing within a unified model, while preserving
image-specific details such as background and illumination. Next, we move on
to the controllable generation of general images beyond faces. Rather than
using GANs that mainly work for specific domains (e.g., faces), we opt to
diffusion models that emerge to show impressive expressivity in synthesizing
complex and general images. With pretraining, we proposed a unified
architecture to boost various kinds of image-to-image translation tasks.
Besides 2D images, we also extend this pretraining philosophy to 3D content
creation. We propose a 3D generative model that uses diffusion models to
automatically generate 3D avatars represented as neural radiance fields.
Building upon this foundational generative model for avatars, we also
demonstrate 3D avatar creation from an image or a text prompt while allowing
for text-based semantic editability.
Date: Wednesday, 19 July 2023
Time: 2:00pm - 4:00pm
Venue: Room 3494
lifts 25/26
Chairperson: Prof. Jiewen HONG (MARK)
Committee Members: Prof. Qifeng CHEN (Supervisor)
Prof. Long CHEN
Prof. Long QUAN
Prof. Ling SHI (ECE)
Prof. Ping LUO (HKU)