More about HKUST
Rendering Text in Images in the GenAI Era
PhD Thesis Proposal Defence Title: "Rendering Text in Images in the GenAI Era" by Mr. Jingye CHEN Abstract: With the rapid advancement of generative models, text-to-image generation has made remarkable progress, largely driven by diffusion-based approaches. However, accurately and editably rendering text within images remains a fundamental challenge, limiting practical applications in design, advertising, and document generation. To address this, our roadmap unfolds in three steps: (1) TextDiffuser-1 aims to enhance the text rendering accuracy. It uses a Transformer-based layout planner with a conditional diffusion backbone, and introduces MARIO-10M, a large-scale dataset containing 10 million image–text pairs with OCR and character-level annotations, along with MARIO-Eval, a benchmark designed for comprehensive evaluation of visual text generation. (2) TextDiffuser-2 enhances flexibility and stylistic diversity. It fine-tunes a large language model to autonomously generate and refine layout plans via conversational interactions. In addition, it adopts line-level text conditioning instead of dense character-level guidance, enabling more natural and diverse rendering without sacrificing legibility. (3) TextDiffuser-3 aims to enhance the editability. It generalizes the problem from single image generation to full layered graphic design synthesis. It leverages a vision–language model to coordinate reference generation, design planning, and layer-wise decomposition, supported by expert modules such as SAM and element-removal models to produce harmonized, editable designs. Together, these three steps outline a path from rendering accurate and legible text to generating fully editable, layered graphic designs. It empowers designers and content creators to harness generative models not only for inspiration, but also for production-ready, customizable visual content. Date: Wednesday, 30 July 2025 Time: 3:30pm - 5:30pm Venue: Room 3494 Lifts 25/26 Committee Members: Dr. Qifeng Chen (Supervisor) Prof. Gary Chan (Chairperson) Prof. Tim Cheng