Analyzing Data with Regret Minimization Query

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Analyzing Data with Regret Minimization Query"

By

Miss Min XIE


Abstract

Extracting interesting tuples from a large database is an important problem in 
multi-criteria decision making. Two representative queries were proposed in the 
literature: top-k queries and skyline queries. A top-k query requires users to 
specify their utility functions beforehand and then returns k tuples to the 
users. A skyline query does not require any utility function from users but it 
puts no control on the number of tuples returned to users. Recently, a regret 
minimization query was proposed and received attention from the community 
because it does not require any utility function from users and the output size 
is controllable, and thus, it avoids those deficiencies of top-k queries and 
skyline queries. Specifically, it returns a small set of tuples such that any 
user’s favorite tuple in this subset is guaranteed to be not much worse than 
his/her favorite tuple in the whole database. In a sense, this query finds a 
small set of tuples that makes the user happy (i.e., not regretful) even if 
s/he gets the best tuple in the selected set but not the best tuple among all 
tuples in the database.

In this thesis, we first study two versions of the regret minimization query: 
(1) the min-error version: we want to minimize the regret level of each user 
while guaranteeing the output size is at most k where k is the maximum output 
size. This problem is also known as the k-regret query, and (2) the min-size 
version: we want to determine the minimum number of tuples needed to keep users 
happy at a given level. We term this problem as the a-happiness query where a 
represents a happiness threshold. In both versions, we quantify the user’s 
regret level (and happiness level) by a criterion, called the regret ratio (and 
the happiness ratio). A small regret ratio (i.e., a large happiness ratio) 
indicates the user is satisfied with the set returned. In addition, we study 
how to enhance the regret minimization query with user interactions: when 
presented with a small number of tuples (which can be artificial tuples or true 
tuples inside the database), a user is asked to indicate the tuple s/he favors 
the most among them. In particular, we are also interested in the special case 
of determining the favorite tuple for a user in the entire database with a 
small amount of interaction, measured by the number of questions we ask the 
user.

Finding the optimal solution for a regret minimization query is an NP-hard 
problem. We derive efficient approximate solutions for both versions of regret 
minimization. We also present a generic framework for interactive regret 
minimization, under which we propose algorithms that ask an asymptotically 
optimal number of questions in 2-dimensional spaces and algorithms with 
provable performance guarantees in d-dimensional spaces (d ≥ 2) where each 
dimension corresponds to a description of a tuple. We performed extensive 
experiments using both real and synthetic datasets. Our evaluations show that 
our algorithms outperform the best-known previous approaches in three ways: (i) 
they answer the regret minimization queries by guaranteeing a smaller regret 
ratio and returning fewer tuples to users, (ii) they do so much faster (up to 
two orders of magnitude times improvement), and (iii) they outperform the 
existing interactive regret minimization algorithms by locating the favorite 
tuple and guaranteeing a small regret ratio with much fewer questions.


Date:			Monday, 5 August 2019

Time:			10:00am - 12:00noon

Venue:			Room 3494
 			Lifts 25/26

Chairman:		Prof. Shaohui Zheng (ISOM)

Committee Members:	Prof. Raymond Wong (Supervisor)
 			Prof. Shing-Chi Cheung
 			Prof. Xiaojuan Ma
 			Prof. Henry Lam (CBE)
 			Prof. Reynold Cheng (HKU)


**** ALL are Welcome ****