Designing Interactive Scaffolds for Steering VLM-Powered Visualization Generation

PhD Thesis Proposal Defence


Title: "Designing Interactive Scaffolds for Steering VLM-Powered 
Visualization Generation"

by

Miss Liwenhan XIE


Abstract:

Visualization is a powerful means for everyday data consumption, such as 
business analytics, personal health analysis, and data-driven storytelling. 
Recent advances in vision language models (VLMs) have shown promise in 
complex, data-intensive tasks for dynamic, personal usages. However, 
VLM-powered applications remain susceptible to errors stemming from 
inaccurate intent understanding, hallucinations, and the inherent 
ambiguities of natural language. To address these issues, I investigate the 
critical stages of the visualization pipeline-data transformation, visual 
mapping, and visual augmentation-emphasizing the necessity for human 
oversight in steering VLM outputs. Rather than prioritizing accuracy in 
closed-ended tasks, this thesis focuses on interaction design within an 
iterative refinement process tailored for open-ended tasks that emphasize 
human agency. Through exemplar tasks of data exploration, visualization 
example reuse, and visualization animation, I develop interfaces and 
interaction techniques that support the verification and iterative 
refinement of VLM-generated content, where we aim to lower the barrier to 
creating personal visualizations and improve user engagement in the process.

My design approach is structured around three conceptual pillars: (1) 
Building interactive scaffolds for an “intent first, nuances 
on-demand”workflow, (2) Surfacing low-level specifications internal to VLMs 
with intuitive representations, and (3) Extracting atomic controls to afford 
granular refinement. The first study introduces a visualization approach 
that transforms VLM-generated data processing code and execution results 
into a flow diagram. The visualization continuously expands as the code is 
generated in a stream. Users may inspect the diagram and leverage the 
operation blocks to refine the code at a smaller granularity than initiating 
regeneration of the entire snippet. The second study presents an approach to 
reusing and adapting bespoke visualization examples by automatically 
extracting and encapsulating the data mapping scheme into an extensible 
template. Based on adaptation descriptions, new widgets are generated for 
direct manipulation over corresponding dimensions. The third study describes 
an animation tool to vivify static visualizations. Through a layered 
generation workflow, users may first attain element-wise animation 
configurations from text-based descriptions, coordinate the elements in 
groups, and conduct global timeline adjustments.

Ultimately, these pieces contribute to a deeper understanding of how 
interaction design can mitigate the limitations of both VLMs and human 
expression, fostering more reliable and user-centered data visualization 
practices.


Date:                   Tuesday, 20 May 2025

Time:                   2:00pm - 4:00pm

Venue:                  Room 2128A
                        Lift 19

Committee Members:      Prof. Huamin Qu (Supervisor)
                        Dr. Arpit Narechania (Chairperson)
                        Dr. Anyi Rao (AMC)