More about HKUST
Designing Interactive Scaffolds for Steering VLM-Powered Visualization Generation
PhD Thesis Proposal Defence
Title: "Designing Interactive Scaffolds for Steering VLM-Powered
Visualization Generation"
by
Miss Liwenhan XIE
Abstract:
Visualization is a powerful means for everyday data consumption, such as
business analytics, personal health analysis, and data-driven storytelling.
Recent advances in vision language models (VLMs) have shown promise in
complex, data-intensive tasks for dynamic, personal usages. However,
VLM-powered applications remain susceptible to errors stemming from
inaccurate intent understanding, hallucinations, and the inherent
ambiguities of natural language. To address these issues, I investigate the
critical stages of the visualization pipeline-data transformation, visual
mapping, and visual augmentation-emphasizing the necessity for human
oversight in steering VLM outputs. Rather than prioritizing accuracy in
closed-ended tasks, this thesis focuses on interaction design within an
iterative refinement process tailored for open-ended tasks that emphasize
human agency. Through exemplar tasks of data exploration, visualization
example reuse, and visualization animation, I develop interfaces and
interaction techniques that support the verification and iterative
refinement of VLM-generated content, where we aim to lower the barrier to
creating personal visualizations and improve user engagement in the process.
My design approach is structured around three conceptual pillars: (1)
Building interactive scaffolds for an “intent first, nuances
on-demand”workflow, (2) Surfacing low-level specifications internal to VLMs
with intuitive representations, and (3) Extracting atomic controls to afford
granular refinement. The first study introduces a visualization approach
that transforms VLM-generated data processing code and execution results
into a flow diagram. The visualization continuously expands as the code is
generated in a stream. Users may inspect the diagram and leverage the
operation blocks to refine the code at a smaller granularity than initiating
regeneration of the entire snippet. The second study presents an approach to
reusing and adapting bespoke visualization examples by automatically
extracting and encapsulating the data mapping scheme into an extensible
template. Based on adaptation descriptions, new widgets are generated for
direct manipulation over corresponding dimensions. The third study describes
an animation tool to vivify static visualizations. Through a layered
generation workflow, users may first attain element-wise animation
configurations from text-based descriptions, coordinate the elements in
groups, and conduct global timeline adjustments.
Ultimately, these pieces contribute to a deeper understanding of how
interaction design can mitigate the limitations of both VLMs and human
expression, fostering more reliable and user-centered data visualization
practices.
Date: Tuesday, 20 May 2025
Time: 2:00pm - 4:00pm
Venue: Room 2128A
Lift 19
Committee Members: Prof. Huamin Qu (Supervisor)
Dr. Arpit Narechania (Chairperson)
Dr. Anyi Rao (AMC)