More about HKUST
Rendering Text in Images in the GenAI Era
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
PhD Thesis Defence
Title: "Rendering Text in Images in the GenAI Era"
By
Mr. Jingye CHEN
Abstract:
With the rapid advancement of generative models, text-to-image generation
has made remarkable progress, largely driven by diffusion-based
approaches. However, accurately and editably rendering text within images
remains a fundamental challenge, limiting practical applications in design,
advertising, and document generation. To address this, our roadmap unfolds
in three steps: (1) TextDiffuser-1 aims to enhance the text rendering
accuracy. It uses a Transformer-based layout planner with a conditional
diffusion backbone, and introduces MARIO-10M, a large-scale dataset
containing 10 million image–text pairs with OCR and character-level
annotations, along with MARIO-Eval, a benchmark designed for comprehensive
evaluation of visual text generation. (2) TextDiffuser-2 enhances
flexibility and stylistic diversity. It fine-tunes a large language model
to autonomously generate and refine layout plans via conversational
interactions. In addition, it adopts line-level text conditioning instead
of dense character-level guidance, enabling more natural and diverse
rendering without sacrificing legibility. (3) TextDiffuser-3 aims to
enhance the editability. It generalizes the problem from single image
generation to full layered graphic design synthesis. It leverages a
vision–language model to coordinate reference generation, design
planning, and layer-wise decomposition, supported by expert modules such
as SAM and element-removal models to produce harmonized, editable
designs. Together, these three steps outline a path from rendering
accurate and legible text to generating fully editable, layered graphic
designs. It empowers designers and content creators to harness
generative models not only for inspiration, but also for
production-ready, customizable visual content.
Date: Friday, 12 December 2025
Time: 3:00pm - 5:00pm
Venue: Room 2128B
Lift 22
Chairman: Dr. Walter Zhe WANG (CIVL)
Committee Members: Dr. Qifeng CHEN (Supervisor)
Prof. Tim CHENG
Prof. Gary CHAN
Dr. Harry YANG (AMC)
Dr. Zhouhui LIAN (PKU)