TALE: Training-free Cross-domain Image Composition via Adaptive Latent Manipulation and Energy-guided Optimization

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


MPhil Thesis Defence


Title: "TALE: Training-free Cross-domain Image Composition via Adaptive 
Latent Manipulation and Energy-guided Optimization"

By

Mr. Trung Kien PHAM


Abstract:

In this thesis, we present TALE, a novel training-free framework harnessing the 
power of text-driven diffusion models to tackle cross-domain image composition 
task that aims at seamlessly incorporating user-provided objects into a 
specific visual context regardless of domain disparity. Previous methods often 
involve either training auxiliary networks or finetuning diffusion models on 
customized datasets, which are expensive and may undermine the robust textual 
and visual priors of pretrained diffusion models. Some recent works attempt to 
break the barrier by proposing training-free workarounds that rely on 
manipulating attention maps to tame the denoising process implicitly. However, 
composing via attention maps does not necessarily yield desired compositional 
outcomes. These approaches could only retain some semantic information and 
usually fall short in preserving identity characteristics of input objects or 
exhibit limited background-object style adaptation in generated images. In 
contrast, TALE is a novel method that operates directly on latent space to 
provide explicit and effective guidance for the composition process to resolve 
these problems. Specifically, we equip TALE with two mechanisms dubbed Adaptive 
Latent Manipulation and Energy-guided Latent Optimization. The former 
formulates noisy latents conducive to initiating and steering the composition 
process by directly leveraging background and foreground latents at 
corresponding timesteps, and the latter exploits designated energy functions to 
further optimize intermediate latents conforming to specific conditions that 
complement the former to generate desired final results. Our experiments 
demonstrate that TALE surpasses prior baselines and attains state-of-the-art 
performance in image-guided composition across various photorealistic and 
artistic domains.


Date:                   Tuesday, 13 August 2024

Time:                   1:00pm - 3:00pm

Venue:                  Room 5501
                        Lifts 25/26

Chairman:               Dr. Long CHEN

Committee Members:      Dr. Qifeng CHEN (Supervisor)
                        Prof. James KWOK