More about HKUST
Making Data Communication for Computational Notebooks Effective and Efficient
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
PhD Thesis Defence
Title: "Making Data Communication for Computational Notebooks Effective and
Efficient"
By
Miss Yanna LIN
Abstract:
In the era of big data, data science has become pivotal in extracting and
generating insights from vast amounts of structured and unstructured data.
Computational notebooks have emerged as essential tools in this field,
integrating code, outputs, and explanatory text to create a computational
narrative that enhances the exploration and communication of complex data
insights. Despite their widespread usage, significant challenges remain in
ensuring effective and efficient data communication of these notebooks at
different levels of granularity: 1) Fine-grain communication: Poor or missing
important content within individual cells significantly hampers detailed
interpretation; 2) Medium-grain communication: Implicit relationships among
cells obscure the structured and cohesive understanding of the notebook; and 3)
Coarse-grain communication: Messy and lengthy notebooks obscure essential
high-level insights, hindering effective overview and quick comprehension.
To address these challenges, this thesis introduces novel interfaces and
algorithms designed to improve the efficiency and effectiveness of data
communication for computational notebooks. For fine-grain communication, we
designed InkSight, a mixed-initiative plugin that automatically generates
explanatory text for chart findings based on users' intents expressed
through sketches, addressing gaps in missing important intra-cell contents.
Recognizing that users still face difficulties in relating the explanatory
texts to the corresponding charts and codes, we developed a second plugin,
InterLink, to help clarify inter-cell relationships and enhance medium-grain
communication, fostering a structured and cohesive understanding of notebooks.
Beyond delving into detailed content, some stakeholders prefer a coarse-grain
overview of notebooks to avoid the clutter of interim notes and findings. To
cater to this preference, our third work introduces DMiner, a data-driven
framework that automates the layout and interaction designs of selected
visualizations, effectively converting them into dashboards. Finally, we
discuss future research directions to further enhance the efficiency and
effectiveness of data communication in computational notebooks.
Date: Thursday, 22 August 2024
Time: 3:00pm - 5:00pm
Venue: Room 5501
Lifts 25/26
Chairman: Prof. Wai Ho MOW (ECE)
Committee Members: Prof. Huamin QU (Supervisor)
Prof. Qiong LUO
Prof. Pedro SANDER
Prof. Hongbo FU (AMC)
Prof. Jinwook SEO (SNU)