More about HKUST
Query Processing in Uncertain Graphs
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Query Processing in Uncertain Graphs" By Mr. Panagiotis PARCHAS Abstract Data in several applications can be represented as an uncertain graph, whose edges are labeled with a probability of existence. Exact query processing on uncertain graphs is prohibitive for most applications, as it involves evaluation over an exponential number of instantiations. Thus, typical approaches employ Monte-Carlo sampling, which i) draws a number of possible graphs (samples), ii) evaluates the query on each of them, and iii) aggregates the individual answers to generate the final result. However, this approach can also be extremely time consuming for large uncertain graphs commonly found in practice. To overcome the high cost, in this thesis we propose two approaches. The first extracts deterministic representative instances that capture structural properties of the uncertain graph. The query and mining tasks can then be efficiently processed using deterministic algorithms on these representatives. The second approach sparsifies the uncertain graph (i.e., reduces the number of its edges) and redistributes its probabilities, minimizing the information loss. Then, Monte-Carlo sampling applied to the reduced graph becomes much more efficient. For the first approach, in order to maintain data utility, the representative instance should preserve structural characteristics of the uncertain graph. We start with representatives that capture the expected vertex degrees, as this is a fundamental property of the graph topology. We then generalize the notion of vertex degree to the concept of n-clique cardinality, i.e., the number of cliques of size n that contain a vertex. For the first problem, we propose two methods: Average Degree Rewiring (ADR), which is based on random edge rewiring, and Approximate B-matching (ABM), which applies graph matching techniques. For the second problem, we develop a greedy approach and a game theoretic framework. We experimentally demonstrate, with real uncertain graphs, that indeed the representative instances can be used to answer, efficiently and accurately, queries based on several metrics such as shortest path distance, clustering coefficient and betweenness centrality. For the second approach, inspired by the literature in deterministic graph sparsification, we aim at sparsified uncertain graphs that maintain the expected cut sizes of the original graph. In addition, in order to reduce the uncertainty of the graph and the variance of the performed queries, we aim at decreasing the entropy of the input graph. To achieve these goals, we propose a framework that consists of two phases: The first (backbone generation), creates a deterministic backbone graph. The second, (probability assignment), generates probabilities for the edges of the backbone graph that capture the expected cuts, while avoiding the probability values that incur high entropy. Based on this framework, we propose two methods: Gradient Decent Backbone (GDB), which assigns probabilities based on gradient decent, and Expectation Maximization Degree (EMD), which iteratively updates the backbone graph and assigns probabilities until convergence. Moreover, we modify state-of-the-art methods of deterministic graph sparsification to comply with the uncertain setting, and we experimentally demonstrate that they underperform compared to the proposed techniques in a variety of graph related queries, including shortest path distance, page rank, clustering coefficient and reliability. Date: Tuesday, 24 January 2017 Time: 2:00pm - 4:00pm Venue: Room 3494 Lifts 25/26 Chairman: Prof. Li Qiu (ECE) Committee Members: Prof. Dimitris Papadias (Supervisor) Prof. Pedro Sander Prof. Raymond Wong Prof. Xiaowei Zhang (IELM) Prof. Jeffrey Xu Yu (Sys. Engg. & Engg. Mgmt., CUHK) **** ALL are Welcome ****