More about HKUST
Analyzing Data with k-regret Query and Diversification
PhD Qualifying Examination
Title: "Analyzing Data with k-regret Query and Diversification"
by
Mr. Min XIE
Abstract:
Extracting interesting tuples from a large database is an important problem in
multi-criteria decision making. Two representative queries were proposed in the
literature: top-k queries and skyline queries. A top-k query requires users to
specify their utility functions beforehand and then returns exactly k tuples to
users. A skyline query does not require any utility function from users but we
cannot predict its output size in advance.
Recently, a k-regret query, which tries to minimize a criterion called the
maximum regret ratio, was proposed to overcome the deficiencies of both top-k
queries and skyline queries. In other words, a k-regret query does not require
any utility function from users, and the output size is controllable. Various
algorithms for performing k-regret queries have been proposed in the literature
and they can be divided into two categories: the CUBE approach and the Greedy
approach. The CUBE approach was proven to have a theoretical guarantee on the
maximum regret ratio of the set of resulting tuples but its empirical quality
is not satisfiable enough. The Greedy approach was shown to return a set with a
better maximum regret ratio in practice but it does not have any theoretical
guarantee. Depending on the way that is used to determine the tuple to be
selected in the Greedy approach, the Greedy approach can be further divided
into two approaches, namely the LP-Greedy approach and the Geo- Greedy
approach. The LP-Greedy approach formulates the k-regret query as a number of
linear programs and performs the query by solving the linear programs. The
Geo-Greedy approach avoids the usage of the time-consuming linear programs and
solves the k-regret query by making use of a number of geometry properties.
Meanwhile, result diversification, which takes both relevance and diversity
into consideration, has attracted a lot of attentions. It aims at improving the
quality of the result retrieved by the user query. Different definitions for
diversification were proposed in the literature: DisC diversity, diversified
top- k set and top k-similar diversification set. DisC diversity aims at
finding a minimum set that satisfies the following two conditions: (1) any two
tuples in the set are not similar to each other; (2) for each tuple in the
database, there is a similar tuple in the selected set. In diversified top-k
set, each tuple is associated with a score. Diversified top-k set is the set of
at most k tuples, such that no two tuples are similar with each other, and the
total score of the set is maximized. Top k-similar diversification set is the
set with the maximum objective value which is defined in term of both relevance
and diversity.
In this survey, we first review the definitions of the k-regret query and
present the algorithms for solv- ing the k-regret query. We then study
different instances of the diversification problem. Besides, we discuss the
possible relationship between the k-regret query and the diversification
problem. Finally, we conclude this survey by giving some future research
directions.
Date: Thursday, 9 March 2017
Time: 4:00pm - 6:00pm
Venue: Room 3494
Lifts 25/26
Committee Members: Dr. Raymond Wong (Supervisor)
Prof. Shing-Chi Cheung (Chairperson)
Prof. Lei Chen
Prof. Frederick Lochovsky
**** ALL are Welcome ****