More about HKUST
Efficient Training Strategy for Aesthetic Text-to-Image Generation Diffusion Model
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
MPhil Thesis Defence
Title: "Efficient Training Strategy for Aesthetic Text-to-Image Generation
Diffusion Model"
By
Mr. Jincheng YU
Abstract:
In this thesis, we address the resource-consuming problem of recent large
text-to-image (T2I) generative models. We propose a three-stage training
strategy with stage-specific datasets to reduce the training resources and
time. i) Pixel dependency learning, where our model learns low-level pixel
dependencies from the ImageNet dataset. This stage focuses on understanding the
intrinsic pixel relationships in natural images. ii) Text- image alignment
learning, where our model learns textual concepts from the SAM dataset, whose
captions are refined by a large vision language model. This stage aims to align
textual concepts with their visual representations. iii) High-resolution and
aesthetic image generation, where our model is fine-tuned to generate
high-resolution and aesthetic images. For this purpose, we utilize an internal
dataset similar to JourneyDB. When we combine our three-stage training strategy
with an existing parameter-efficient transformer-based diffusion model,
experimental results demonstrate that our approach achieves comparable or even
superior image quality and semantic control compared to the SOTA T2I model
Stable Diffusion XL, while our training strategy only requires only 10.8% of
its training time.
Date: Tuesday, 13 August 2024
Time: 3:00pm - 5:00pm
Venue: Room 5506
Lifts 25/26
Chairman: Dr. Dan XU
Committee Members: Prof. James KWOK (Supervisor)
Dr. Qifeng CHEN