More about HKUST
TALE: Training-free Cross-domain Image Composition via Adaptive Latent Manipulation and Energy-guided Optimization
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
MPhil Thesis Defence
Title: "TALE: Training-free Cross-domain Image Composition via Adaptive
Latent Manipulation and Energy-guided Optimization"
By
Mr. Trung Kien PHAM
Abstract:
In this thesis, we present TALE, a novel training-free framework harnessing the
power of text-driven diffusion models to tackle cross-domain image composition
task that aims at seamlessly incorporating user-provided objects into a
specific visual context regardless of domain disparity. Previous methods often
involve either training auxiliary networks or finetuning diffusion models on
customized datasets, which are expensive and may undermine the robust textual
and visual priors of pretrained diffusion models. Some recent works attempt to
break the barrier by proposing training-free workarounds that rely on
manipulating attention maps to tame the denoising process implicitly. However,
composing via attention maps does not necessarily yield desired compositional
outcomes. These approaches could only retain some semantic information and
usually fall short in preserving identity characteristics of input objects or
exhibit limited background-object style adaptation in generated images. In
contrast, TALE is a novel method that operates directly on latent space to
provide explicit and effective guidance for the composition process to resolve
these problems. Specifically, we equip TALE with two mechanisms dubbed Adaptive
Latent Manipulation and Energy-guided Latent Optimization. The former
formulates noisy latents conducive to initiating and steering the composition
process by directly leveraging background and foreground latents at
corresponding timesteps, and the latter exploits designated energy functions to
further optimize intermediate latents conforming to specific conditions that
complement the former to generate desired final results. Our experiments
demonstrate that TALE surpasses prior baselines and attains state-of-the-art
performance in image-guided composition across various photorealistic and
artistic domains.
Date: Tuesday, 13 August 2024
Time: 1:00pm - 3:00pm
Venue: Room 5501
Lifts 25/26
Chairman: Dr. Long CHEN
Committee Members: Dr. Qifeng CHEN (Supervisor)
Prof. James KWOK