More about HKUST
Making Data Communication for Computational Notebooks Effective and Efficient
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Making Data Communication for Computational Notebooks Effective and Efficient" By Miss Yanna LIN Abstract: In the era of big data, data science has become pivotal in extracting and generating insights from vast amounts of structured and unstructured data. Computational notebooks have emerged as essential tools in this field, integrating code, outputs, and explanatory text to create a computational narrative that enhances the exploration and communication of complex data insights. Despite their widespread usage, significant challenges remain in ensuring effective and efficient data communication of these notebooks at different levels of granularity: 1) Fine-grain communication: Poor or missing important content within individual cells significantly hampers detailed interpretation; 2) Medium-grain communication: Implicit relationships among cells obscure the structured and cohesive understanding of the notebook; and 3) Coarse-grain communication: Messy and lengthy notebooks obscure essential high-level insights, hindering effective overview and quick comprehension. To address these challenges, this thesis introduces novel interfaces and algorithms designed to improve the efficiency and effectiveness of data communication for computational notebooks. For fine-grain communication, we designed InkSight, a mixed-initiative plugin that automatically generates explanatory text for chart findings based on users' intents expressed through sketches, addressing gaps in missing important intra-cell contents. Recognizing that users still face difficulties in relating the explanatory texts to the corresponding charts and codes, we developed a second plugin, InterLink, to help clarify inter-cell relationships and enhance medium-grain communication, fostering a structured and cohesive understanding of notebooks. Beyond delving into detailed content, some stakeholders prefer a coarse-grain overview of notebooks to avoid the clutter of interim notes and findings. To cater to this preference, our third work introduces DMiner, a data-driven framework that automates the layout and interaction designs of selected visualizations, effectively converting them into dashboards. Finally, we discuss future research directions to further enhance the efficiency and effectiveness of data communication in computational notebooks. Date: Thursday, 22 August 2024 Time: 3:00pm - 5:00pm Venue: Room 5501 Lifts 25/26 Chairman: Prof. Wai Ho MOW (ECE) Committee Members: Prof. Huamin QU (Supervisor) Prof. Qiong LUO Prof. Pedro SANDER Prof. Hongbo FU (EMIA) Prof. Jinwook SEO (SNU)