Rendering Text in Images in the GenAI Era

PhD Thesis Proposal Defence


Title: "Rendering Text in Images in the GenAI Era"

by

Mr. Jingye CHEN


Abstract:

With the rapid advancement of generative models, text-to-image generation has 
made remarkable progress, largely driven by diffusion-based approaches. 
However, accurately and editably rendering text within images remains a 
fundamental challenge, limiting practical applications in design, 
advertising, and document generation. To address this, our roadmap unfolds in 
three steps:

(1) TextDiffuser-1 aims to enhance the text rendering accuracy. It uses a 
Transformer-based layout planner with a conditional diffusion backbone, and 
introduces MARIO-10M, a large-scale dataset containing 10 million image–text 
pairs with OCR and character-level annotations, along with MARIO-Eval, a 
benchmark designed for comprehensive evaluation of visual text generation.

(2) TextDiffuser-2 enhances flexibility and stylistic diversity. It 
fine-tunes a large language model to autonomously generate and refine layout 
plans via conversational interactions. In addition, it adopts line-level text 
conditioning instead of dense character-level guidance, enabling more natural 
and diverse rendering without sacrificing legibility.

(3) TextDiffuser-3 aims to enhance the editability. It generalizes the 
problem from single image generation to full layered graphic design 
synthesis. It leverages a vision–language model to coordinate reference 
generation, design planning, and layer-wise decomposition, supported by 
expert modules such as SAM and element-removal models to produce harmonized, 
editable designs.

Together, these three steps outline a path from rendering accurate and 
legible text to generating fully editable, layered graphic designs. It 
empowers designers and content creators to harness generative models not only 
for inspiration, but also for production-ready, customizable visual content.


Date:                   Wednesday, 30 July 2025

Time:                   3:30pm - 5:30pm

Venue:                  Room 3494
                        Lifts 25/26

Committee Members:      Dr. Qifeng Chen (Supervisor)
                        Prof. Gary Chan (Chairperson)
                        Prof. Tim Cheng
Privacy Sitemap
Rendering Text in Images in the GenAI Era

About

People

Research

Academics

Admissions