More about HKUST
Local Image Editing and Scalability Analysis with GANs
PhD Thesis Proposal Defence Title: "Local Image Editing and Scalability Analysis with GANs" by Mr. Jiapeng ZHU Abstract: Generative Adversarial Networks(GANs), as one of the leading generative models, have made remarkable progress in generating photorealistic images, particularly in the realms of 2D and 3D within single-domain datasets. Interpreting the internal structure of the latent space of pre-trained GANs has garnered significant attention, as it enhances our understanding of how an image in GANs is rendered, which in turn, enables us to control the semantics in the synthesized images. However, most existing work focuses on controlling image semantics globally (i.e., manipulating an attribute affects the entire image) through the latent space, thereby neglecting the significant need for local semantic editing. Although several techniques for local editing exist, they primarily operate at the instance level using segmentation masks (e.g., feature map replacement, where synthesized results are segmented, and intermediate feature maps within regions of interest are swapped), often resulting in suboptimal editing quality. This thesis addresses these limitations by introducing several novel algorithms for local image editing in the latent space of GANs, offering several key advantages. First, we perform local editing at the semantic level, enabling high-fidelity local editing through latent space manipulations. In addition, generic attribute vectors can be obtained with a single image synthesis, facilitating the rapid discovery of semantic vectors. Furthermore, our approach allows for control over arbitrary local regions or any fine granularity of local areas, even without the need for precise spatial object masks, greatly improving local editing in practical use. Despite these advancements, GANs are often considered less scalable than diffusion or autoregressive models, particularly when trained on large-scale, diverse datasets (e.g., LAION). To address this, the thesis explores the scalability of GANs and proposes integrating the sparse Mixture of Experts (MoE) into the GAN generator to enhance model scalability while maintaining manageable computational costs. Specifically, we introduce a GAN-based text-to-image generator that employs a set of experts for feature processing, coupled with a sparse router to dynamically select the most suitable expert for each feature point. Additionally, the method adopts a two-stage pipeline, combining a base generator and an upsampler to produce images at resolutions of 64x64 and 512x512, respectively. Date: Monday, 9 June 2025 Time: 10:00am - 12:00noon Venue: Room 3494 Lifts 25/26 Committee Members: Dr. Qifeng Chen (Supervisor) Prof. Dit-Yan Yeung (Chairperson) Prof. Pedro Sander