More about HKUST
Local Image Editing and Scalability Analysis with GANs
PhD Thesis Proposal Defence
Title: "Local Image Editing and Scalability Analysis with GANs"
by
Mr. Jiapeng ZHU
Abstract:
Generative Adversarial Networks(GANs), as one of the leading generative
models, have made remarkable progress in generating photorealistic images,
particularly in the realms of 2D and 3D within single-domain datasets.
Interpreting the internal structure of the latent space of pre-trained GANs
has garnered significant attention, as it enhances our understanding of how
an image in GANs is rendered, which in turn, enables us to control the
semantics in the synthesized images.
However, most existing work focuses on controlling image semantics globally
(i.e., manipulating an attribute affects the entire image) through the
latent space, thereby neglecting the significant need for local semantic
editing. Although several techniques for local editing exist, they primarily
operate at the instance level using segmentation masks (e.g., feature map
replacement, where synthesized results are segmented, and intermediate
feature maps within regions of interest are swapped), often resulting in
suboptimal editing quality. This thesis addresses these limitations by
introducing several novel algorithms for local image editing in the latent
space of GANs, offering several key advantages. First, we perform local
editing at the semantic level, enabling high-fidelity local editing through
latent space manipulations. In addition, generic attribute vectors can be
obtained with a single image synthesis, facilitating the rapid discovery of
semantic vectors. Furthermore, our approach allows for control over
arbitrary local regions or any fine granularity of local areas, even without
the need for precise spatial object masks, greatly improving local editing
in practical use.
Despite these advancements, GANs are often considered less scalable than
diffusion or autoregressive models, particularly when trained on
large-scale, diverse datasets (e.g., LAION). To address this, the thesis
explores the scalability of GANs and proposes integrating the sparse Mixture
of Experts (MoE) into the GAN generator to enhance model scalability while
maintaining manageable computational costs. Specifically, we introduce a
GAN-based text-to-image generator that employs a set of experts for feature
processing, coupled with a sparse router to dynamically select the most
suitable expert for each feature point. Additionally, the method adopts a
two-stage pipeline, combining a base generator and an upsampler to produce
images at resolutions of 64x64 and 512x512, respectively.
Date: Monday, 9 June 2025
Time: 10:00am - 12:00noon
Venue: Room 3494
Lifts 25/26
Committee Members: Dr. Qifeng Chen (Supervisor)
Prof. Dit-Yan Yeung (Chairperson)
Prof. Pedro Sander