Making Data Communication for Computational Notebooks Effective and Efficient

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Making Data Communication for Computational Notebooks Effective and 
Efficient"

By

Miss Yanna LIN


Abstract:

In the era of big data, data science has become pivotal in extracting and 
generating insights from vast amounts of structured and unstructured data. 
Computational notebooks have emerged as essential tools in this field, 
integrating code, outputs, and explanatory text to create a computational 
narrative that enhances the exploration and communication of complex data 
insights. Despite their widespread usage, significant challenges remain in 
ensuring effective and efficient data communication of these notebooks at 
different levels of granularity: 1) Fine-grain communication: Poor or missing 
important content within individual cells significantly hampers detailed 
interpretation; 2) Medium-grain communication: Implicit relationships among 
cells obscure the structured and cohesive understanding of the notebook; and 3) 
Coarse-grain communication: Messy and lengthy notebooks obscure essential 
high-level insights, hindering effective overview and quick comprehension.

To address these challenges, this thesis introduces novel interfaces and 
algorithms designed to improve the efficiency and effectiveness of data 
communication for computational notebooks. For fine-grain communication, we 
designed InkSight, a mixed-initiative plugin that automatically generates 
explanatory text for chart findings based on users' intents expressed 
through sketches, addressing gaps in missing important intra-cell contents. 
Recognizing that users still face difficulties in relating the explanatory 
texts to the corresponding charts and codes, we developed a second plugin, 
InterLink, to help clarify inter-cell relationships and enhance medium-grain 
communication, fostering a structured and cohesive understanding of notebooks. 
Beyond delving into detailed content, some stakeholders prefer a coarse-grain 
overview of notebooks to avoid the clutter of interim notes and findings. To 
cater to this preference, our third work introduces DMiner, a data-driven 
framework that automates the layout and interaction designs of selected 
visualizations, effectively converting them into dashboards. Finally, we 
discuss future research directions to further enhance the efficiency and 
effectiveness of data communication in computational notebooks.


Date:                   Thursday, 22 August 2024

Time:                   3:00pm - 5:00pm

Venue:                  Room 5501
                        Lifts 25/26

Chairman:               Prof. Wai Ho MOW (ECE)

Committee Members:      Prof. Huamin QU (Supervisor)
                        Prof. Qiong LUO
                        Prof. Pedro SANDER
                        Prof. Hongbo FU (EMIA)
                        Prof. Jinwook SEO (SNU)