On End-User Error Handling in Interactive Queries and Tables

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "On End-User Error Handling in Interactive Queries and Tables"

By

Mr. Qixu CHEN


Abstract:

Many applications designed for end-users rely on human-generated inputs, but 
their performance can degrade significantly due to the inherent unreliability 
of human operations, leading to undesirable outcomes. For instance, in 
end-user decision-making process where the user needs to find the most 
interesting tuple in a large dataset, the interactive queries require the 
user to engage through a series of questions, each requiring him/her to 
compare 2 tuples for choosing a more preferred one, to elicit the user’s 
preference. The system then recommends tuples based on the learned 
preference. However, even a single erroneous response from the user can 
mislead the learning process, resulting in sub-optimal recommendations. 
Similarly, for common end-user data processing tools such as Microsoft Excel 
which requires human-generated relational tables as input for analysis, 
errors in tables can cause degenerated model performance and flawed analysis. 
Unlike enterprise-level settings, where domain experts and data governance 
help detect and correct errors, end-user environments lack such resources, 
making error handling a significant challenge.

This thesis addresses the problem of error handling in interactive queries 
and end-user tables through three studies. In the first study, we tackle 
random user errors in interactive queries and propose an algorithm that 
minimizes the number of questions needed when each tuple has two attributes. 
Then, for scenarios where each tuple is described by multiple attributes, we 
propose two algorithms, both with provable performance guarantees. In the 
second study, we develop techniques to handle both random and persistent user 
errors, proposing two algorithms, one with asymptotically optimal round 
efficiency, and the other with a small number of rounds empirically. In the 
third study, we focus on table data cleaning in end-user scenarios, and 
develop a framework to systematically catch errors using a novel class of 
data-quality constraints that we call Semantic-Domain Constraints, which can 
be automatically applied to any tables, without requiring domain experts to 
manually specify on a per-table basis. Collectively, these contributions 
advance error handling techniques for end-user applications, enhancing the 
robustness of interactive queries and user tables in practical applications.


Date:                   Tuesday, 27 May 2025

Time:                   3:00pm - 5:00pm

Venue:                  Room 2128A
                        Lift 19

Chairman:               Dr. Eun Soon IM (CIVL)

Committee Members:      Prof. Raymond WONG (Supervisor)
                        Prof. Dimitris PAPADIAS
                        Prof. Xiaofang ZHOU
                        Prof. Xueqing ZHANG (CIVL)
                        Prof. Kyriakos MOURATIDIS (SMU)