More about HKUST
Designing Interactive Scaffolds for Steering VLM-Powered Visualization Generation
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
PhD Thesis Defence
Title: "Designing Interactive Scaffolds for Steering VLM-Powered
Visualization Generation"
By
Miss Liwenhan XIE
Abstract:
Visualization is a powerful means for everyday data consumption, such as
business analytics, personal informatics, and data-driven storytelling.
Recent advances in vision language models (VLMs) have been rapidly
transforming complex, data-intensive tasks, allowing creating high-fidelity
visualizations from one simple prompt. However, transactional VLM use
fundamentally compromises human agency in open-ended data tasks. Functional
data processing risks hallucinations and distortion of the analytical intent,
and communicative design often leads to ineffective regeneration cycles when
tweaking nuanced visual specifications. Critically, this pervasive challenge
stems from the VLM's black-box nature and the resulting coarse granularity of
control available for precise intervention.
This thesis focuses on empowering proficient creators and data practitioners
in subjective, iterative tasks where human agency is paramount. This
emphasizes the nuanced, exploratory, and iterative process essential for
effective human-VLM collaboration. My approach is to construct interactive
layers built on top of generative models that provide semantically meaningful
and fine-grained intervention points, namely interactive scaffolds. The
design of these interactive scaffolds enforces an "intent first, nuances on
demand" paradigm, which is structured around three conceptual pillars: (1)
anchoring human intent early and deeply in the VLM's workflow, (2) surfacing
model-internal representations with semantically meaningful blocks for
proactive steering, and (3) extracting semantic controls to enable granular
refinement of the output.
I investigate the critical stages of the visualization pipeline powered by
VLMs: data transformation, visual mapping, and visual augmentation. The first
study introduces an approach (WaitGPT) that transforms VLM-generated data
processing code and execution results into a flow diagram. This scaffold
anchors analysis logic, providing intermediate entry points for users to
verify and refine analytical intent at a smaller operation granularity. The
second study presents a pipeline (DataWink) to create bespoke visualization
with a reference by automatically extracting and encapsulating the data
mapping scheme into an extensible template. This scaffold anchors the VLM to
a high-fidelity structural intent, allowing flexible reuse and adaptation.
The third study describes an animation tool (DataSway) to vivify static
metaphoric visualizations. This scaffold enables the refinement of
communicative intent by connecting high-level descriptions to fine-grained
animation configurations.
Ultimately, these pieces contribute to a deeper understanding of how the
interactive scaffolds framework can mitigate inherent limitations of VLMs,
promoting verifiable, user-centered data visualization practices with
increased human agency.
Date: Thursday, 27 November 2025
Time: 9:00am - 11:00am
Venue: Room 5501
Lifts 25/26
Chairman: Dr. Shangyu DANG (LIFS)
Committee Members: Prof. Huamin QU (Supervisor)
Dr. Xiaojuan MA
Dr. Arpit NARECHANIA
Prof. Yunya SONG (IEDA)
Dr. Fanny CHEVALIER (UofT)