More about HKUST
Analyzing Data with k-regret Query and Diversification
PhD Qualifying Examination Title: "Analyzing Data with k-regret Query and Diversification" by Mr. Min XIE Abstract: Extracting interesting tuples from a large database is an important problem in multi-criteria decision making. Two representative queries were proposed in the literature: top-k queries and skyline queries. A top-k query requires users to specify their utility functions beforehand and then returns exactly k tuples to users. A skyline query does not require any utility function from users but we cannot predict its output size in advance. Recently, a k-regret query, which tries to minimize a criterion called the maximum regret ratio, was proposed to overcome the deficiencies of both top-k queries and skyline queries. In other words, a k-regret query does not require any utility function from users, and the output size is controllable. Various algorithms for performing k-regret queries have been proposed in the literature and they can be divided into two categories: the CUBE approach and the Greedy approach. The CUBE approach was proven to have a theoretical guarantee on the maximum regret ratio of the set of resulting tuples but its empirical quality is not satisfiable enough. The Greedy approach was shown to return a set with a better maximum regret ratio in practice but it does not have any theoretical guarantee. Depending on the way that is used to determine the tuple to be selected in the Greedy approach, the Greedy approach can be further divided into two approaches, namely the LP-Greedy approach and the Geo- Greedy approach. The LP-Greedy approach formulates the k-regret query as a number of linear programs and performs the query by solving the linear programs. The Geo-Greedy approach avoids the usage of the time-consuming linear programs and solves the k-regret query by making use of a number of geometry properties. Meanwhile, result diversification, which takes both relevance and diversity into consideration, has attracted a lot of attentions. It aims at improving the quality of the result retrieved by the user query. Different definitions for diversification were proposed in the literature: DisC diversity, diversified top- k set and top k-similar diversification set. DisC diversity aims at finding a minimum set that satisfies the following two conditions: (1) any two tuples in the set are not similar to each other; (2) for each tuple in the database, there is a similar tuple in the selected set. In diversified top-k set, each tuple is associated with a score. Diversified top-k set is the set of at most k tuples, such that no two tuples are similar with each other, and the total score of the set is maximized. Top k-similar diversification set is the set with the maximum objective value which is defined in term of both relevance and diversity. In this survey, we first review the definitions of the k-regret query and present the algorithms for solv- ing the k-regret query. We then study different instances of the diversification problem. Besides, we discuss the possible relationship between the k-regret query and the diversification problem. Finally, we conclude this survey by giving some future research directions. Date: Thursday, 9 March 2017 Time: 4:00pm - 6:00pm Venue: Room 3494 Lifts 25/26 Committee Members: Dr. Raymond Wong (Supervisor) Prof. Shing-Chi Cheung (Chairperson) Prof. Lei Chen Prof. Frederick Lochovsky **** ALL are Welcome ****