Mining Similarity and Causality on Graphs

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Mining Similarity and Causality on Graphs"

By

Mr. Zijian LI


Abstract

Similarity and causality mining on graphs are common and fundamental 
requirements for graph-based applications. Specifically, the similarities on 
graphs can be separated into two categories: the similarity between graphs 
(i.e., graph similarity), and the similarity between nodes (i.e., node 
similarity) on a graph. A common graph similarity measure is the Graph Edit 
Distance (GED). In this work, we propose a probabilistic approach, GBDA, to 
approximate GED between two graphs according to the Graph Branch Distance (GBD) 
between them. On the other hand, we propose an efficient index, G*-tree, to 
compute the node similarity measured by the distance between two nodes, and we 
propose a shortcut selection algorithm to optimize the efficiency of distance 
queries on G*-tree. Moreover, we propose a multi-view learning method, TransN, 
to encode the similarity between nodes into node embeddings. In TransN, we 
propose an algorithm to capture the proximity information inside every single 
view. To transfer information across views, we propose an algorithm to 
translate the node embeddings between different views based on the 
dual-learning mechanism.

The causality on knowledge graphs is represented by the reasoning paths for 
each fact. The state-of-the-art solution of the reasoning path finding is to 
train a reinforcement learning agent which selects some reasoning paths for a 
given fact. However, the existing methods cannot distinguish the quality of 
different reasoning paths well. To address this problem, in this paper, we 
model the problem of finding high-quality reasoning paths as an optimization 
problem on the embedding space, which is NP-hard. Then, we propose a novel 
approximation method, RETINA, to search the reasoning paths of high quality. To 
enhance the efficiency of RETINA, we propose a set of lower bounds to prune 
reasoning paths in the search process.

The experiments show that our proposed methods have better efficiency, 
effectiveness, and scalability than the state-of-the-art methods on real and 
synthetic data sets.


Date:			Friday, 17 January 2020

Time:			3:00pm - 5:00pm

Venue:			Room 5510
 			Lifts 25/26

Chairman:		Prof. Francesco CIUCCI (MAE)

Committee Members:	Prof. Lei CHEN (Supervisor)
 			Prof. Bo LI
 			Prof. Qiong LUO
 			Prof. Can YANG (MATH)
 			Prof. Pei JIAN (Simon Fraser University)


**** ALL are Welcome ****