More about HKUST
Designing Interactive Scaffolds for Steering VLM-Powered Visualization Generation
PhD Thesis Proposal Defence Title: "Designing Interactive Scaffolds for Steering VLM-Powered Visualization Generation" by Miss Liwenhan XIE Abstract: Visualization is a powerful means for everyday data consumption, such as business analytics, personal health analysis, and data-driven storytelling. Recent advances in vision language models (VLMs) have shown promise in complex, data-intensive tasks for dynamic, personal usages. However, VLM-powered applications remain susceptible to errors stemming from inaccurate intent understanding, hallucinations, and the inherent ambiguities of natural language. To address these issues, I investigate the critical stages of the visualization pipeline-data transformation, visual mapping, and visual augmentation-emphasizing the necessity for human oversight in steering VLM outputs. Rather than prioritizing accuracy in closed-ended tasks, this thesis focuses on interaction design within an iterative refinement process tailored for open-ended tasks that emphasize human agency. Through exemplar tasks of data exploration, visualization example reuse, and visualization animation, I develop interfaces and interaction techniques that support the verification and iterative refinement of VLM-generated content, where we aim to lower the barrier to creating personal visualizations and improve user engagement in the process. My design approach is structured around three conceptual pillars: (1) Building interactive scaffolds for an “intent first, nuances on-demand”workflow, (2) Surfacing low-level specifications internal to VLMs with intuitive representations, and (3) Extracting atomic controls to afford granular refinement. The first study introduces a visualization approach that transforms VLM-generated data processing code and execution results into a flow diagram. The visualization continuously expands as the code is generated in a stream. Users may inspect the diagram and leverage the operation blocks to refine the code at a smaller granularity than initiating regeneration of the entire snippet. The second study presents an approach to reusing and adapting bespoke visualization examples by automatically extracting and encapsulating the data mapping scheme into an extensible template. Based on adaptation descriptions, new widgets are generated for direct manipulation over corresponding dimensions. The third study describes an animation tool to vivify static visualizations. Through a layered generation workflow, users may first attain element-wise animation configurations from text-based descriptions, coordinate the elements in groups, and conduct global timeline adjustments. Ultimately, these pieces contribute to a deeper understanding of how interaction design can mitigate the limitations of both VLMs and human expression, fostering more reliable and user-centered data visualization practices. Date: Tuesday, 20 May 2025 Time: 2:00pm - 4:00pm Venue: Room 2128A Lift 19 Committee Members: Prof. Huamin Qu (Supervisor) Dr. Arpit Narechania (Chairperson) Dr. Anyi Rao (AMC)