More about HKUST
Rendering Text in Images in the GenAI Era
PhD Thesis Proposal Defence
Title: "Rendering Text in Images in the GenAI Era"
by
Mr. Jingye CHEN
Abstract:
With the rapid advancement of generative models, text-to-image generation has
made remarkable progress, largely driven by diffusion-based approaches.
However, accurately and editably rendering text within images remains a
fundamental challenge, limiting practical applications in design,
advertising, and document generation. To address this, our roadmap unfolds in
three steps:
(1) TextDiffuser-1 aims to enhance the text rendering accuracy. It uses a
Transformer-based layout planner with a conditional diffusion backbone, and
introduces MARIO-10M, a large-scale dataset containing 10 million image–text
pairs with OCR and character-level annotations, along with MARIO-Eval, a
benchmark designed for comprehensive evaluation of visual text generation.
(2) TextDiffuser-2 enhances flexibility and stylistic diversity. It
fine-tunes a large language model to autonomously generate and refine layout
plans via conversational interactions. In addition, it adopts line-level text
conditioning instead of dense character-level guidance, enabling more natural
and diverse rendering without sacrificing legibility.
(3) TextDiffuser-3 aims to enhance the editability. It generalizes the
problem from single image generation to full layered graphic design
synthesis. It leverages a vision–language model to coordinate reference
generation, design planning, and layer-wise decomposition, supported by
expert modules such as SAM and element-removal models to produce harmonized,
editable designs.
Together, these three steps outline a path from rendering accurate and
legible text to generating fully editable, layered graphic designs. It
empowers designers and content creators to harness generative models not only
for inspiration, but also for production-ready, customizable visual content.
Date: Wednesday, 30 July 2025
Time: 3:30pm - 5:30pm
Venue: Room 3494
Lifts 25/26
Committee Members: Dr. Qifeng Chen (Supervisor)
Prof. Gary Chan (Chairperson)
Prof. Tim Cheng