From Textual Chain-of-Thought to Interleaved Image-Text CoT for Multi-modal Reasoning

PhD Qualifying Examination


Title: "From Textual Chain-of-Thought to Interleaved Image-Text CoT for Multi-modal
Reasoning"

by

Mr. Wei CHEN


Abstract:

Chain-of-thought (CoT) reasoning has emerged as a pivotal technique for 
unlocking the multi-step reasoning capabilities of large language models 
(LLMs). By decomposing complex problems into intermediate reasoning steps, 
CoT prompting and its variants have achieved remarkable success across 
mathematical, commonsense, and symbolic reasoning tasks. With the rapid 
advancement of multimodal large language models (MLLMs), researchers have 
extended CoT paradigms beyond purely textual domains to incorporate visual 
information, giving rise to multimodal CoT methods that leverage both images 
and text for enhanced reasoning. However, a fundamental limitation persists 
in existing approaches: even when visual inputs are provided, the reasoning 
chain itself remains confined to textual modalities, preventing models from 
fully exploiting visual reasoning strategies akin to human mental imagery. 
This survey provides a comprehensive analysis of the evolution from textual 
CoT to interleaved image-text CoT for multimodal reasoning. We organize our 
discussion along three principal axes: (1) textual CoT methods that establish 
foundational reasoning mechanisms in LLMs, (2) multimodal CoT approaches that 
integrate visual information into the reasoning pipeline while maintaining 
text-only reasoning chains, and (3) the emerging paradigm of interleaved 
image-text CoT, where models actively generate and reason over visual 
artifacts during the thinking process. By tracing this evolutionary 
trajectory, we offer a structured roadmap that connects foundational 
techniques to frontier research, identifying key challenges and promising 
directions for future investigation.


Date:                   Monday, 20 April 2026

Time:                   3:30pm - 5:00pm

Venue:                  Room 2132C
                        Lift 22

Committee Members:      Dr. Long Chen (Supervisor)
                        Prof. Nevin Zhang (Chairperson)
                        Dr. Qifeng Chen