More about HKUST
Efficient Training Strategy for Aesthetic Text-to-Image Generation Diffusion Model
The Hong Kong University of Science and Technology Department of Computer Science and Engineering MPhil Thesis Defence Title: "Efficient Training Strategy for Aesthetic Text-to-Image Generation Diffusion Model" By Mr. Jincheng YU Abstract: In this thesis, we address the resource-consuming problem of recent large text-to-image (T2I) generative models. We propose a three-stage training strategy with stage-specific datasets to reduce the training resources and time. i) Pixel dependency learning, where our model learns low-level pixel dependencies from the ImageNet dataset. This stage focuses on understanding the intrinsic pixel relationships in natural images. ii) Text- image alignment learning, where our model learns textual concepts from the SAM dataset, whose captions are refined by a large vision language model. This stage aims to align textual concepts with their visual representations. iii) High-resolution and aesthetic image generation, where our model is fine-tuned to generate high-resolution and aesthetic images. For this purpose, we utilize an internal dataset similar to JourneyDB. When we combine our three-stage training strategy with an existing parameter-efficient transformer-based diffusion model, experimental results demonstrate that our approach achieves comparable or even superior image quality and semantic control compared to the SOTA T2I model Stable Diffusion XL, while our training strategy only requires only 10.8% of its training time. Date: Tuesday, 13 August 2024 Time: 3:00pm - 5:00pm Venue: Room 5506 Lifts 25/26 Chairman: Dr. Dan XU Committee Members: Prof. James KWOK (Supervisor) Dr. Qifeng CHEN