Local Image Editing and Scalability Analysis with GANs

PhD Thesis Proposal Defence


Title: "Local Image Editing and Scalability Analysis with GANs"

by

Mr. Jiapeng ZHU


Abstract:

Generative Adversarial Networks(GANs), as one of the leading generative 
models, have made remarkable progress in generating photorealistic images, 
particularly in the realms of 2D and 3D within single-domain datasets. 
Interpreting the internal structure of the latent space of pre-trained GANs 
has garnered significant attention, as it enhances our understanding of how 
an image in GANs is rendered, which in turn, enables us to control the 
semantics in the synthesized images.

However, most existing work focuses on controlling image semantics globally 
(i.e., manipulating an attribute affects the entire image) through the 
latent space, thereby neglecting the significant need for local semantic 
editing. Although several techniques for local editing exist, they primarily 
operate at the instance level using segmentation masks (e.g., feature map 
replacement, where synthesized results are segmented, and intermediate 
feature maps within regions of interest are swapped), often resulting in 
suboptimal editing quality. This thesis addresses these limitations by 
introducing several novel algorithms for local image editing in the latent 
space of GANs, offering several key advantages. First, we perform local 
editing at the semantic level, enabling high-fidelity local editing through 
latent space manipulations. In addition, generic attribute vectors can be 
obtained with a single image synthesis, facilitating the rapid discovery of 
semantic vectors. Furthermore, our approach allows for control over 
arbitrary local regions or any fine granularity of local areas, even without 
the need for precise spatial object masks, greatly improving local editing 
in practical use.

Despite these advancements, GANs are often considered less scalable than 
diffusion or autoregressive models, particularly when trained on 
large-scale, diverse datasets (e.g., LAION). To address this, the thesis 
explores the scalability of GANs and proposes integrating the sparse Mixture 
of Experts (MoE) into the GAN generator to enhance model scalability while 
maintaining manageable computational costs.  Specifically, we introduce a 
GAN-based text-to-image generator that employs a set of experts for feature 
processing, coupled with a sparse router to dynamically select the most 
suitable expert for each feature point. Additionally, the method adopts a 
two-stage pipeline, combining a base generator and an upsampler to produce 
images at resolutions of 64x64 and 512x512, respectively.


Date:                   Monday, 9 June 2025

Time:                   10:00am - 12:00noon

Venue:                  Room 3494
                        Lifts 25/26

Committee Members:      Dr. Qifeng Chen (Supervisor)
                        Prof. Dit-Yan Yeung (Chairperson)
                        Prof. Pedro Sander