Accelerating In-Memory Subgraph Matching on a Single Machine

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Accelerating In-Memory Subgraph Matching on a Single Machine"

By

Mr. Shixuan SUN


Abstract

Graph data are ubiquitous in practice, and subgraph matching is a basic 
operation in graph analytics. As modern computers typically have 
sufficient processing power and memory resources, we study how to 
accelerate in-memory subgraph matching on a single computer. Our approach 
is through algorithmic optimization and parallelization.

First, we design PSM, a parallel algorithmic framework for 
backtracking-based subgraph matching. Specifically, we abstract the 
matching process as a depth-first search (DFS) tree, and parallelize the 
search by dividing up the tree into search regions and exploring these 
regions in parallel. We design an effective dynamic load balance strategy 
for the search. As a result, the PSM-style algorithms achieved a speedup 
of 15X-19X over their sequential counterparts on a twenty-core machine.

Second, we develop LIGHT, a parallel subgraph matching algorithm for 
unlabeled graphs. The search space in unlabeled subgraph matching, or 
called subgraph enumeration, is big due to the absence of vertex labels. 
Our approach to reducing the search space and thus redundant computation 
is to defer the materialization of intermediate results and to increase 
their reuse. Furthermore, we parallelize our algorithm with vector and 
simultaneous multithreading instructions in modern computers. Our results 
show that LIGHT significantly outperforms the state of the art.

Finally, we design vcFV, an efficient framework for finding all data 
graphs in a graph database each of which contains the given query graph. 
Existing work takes the indexing-filtering-verification approach to 
evaluate queries. However, the indexing phase suffers serious scalability 
problems. Our framework utilizes the leading subgraph matching algorithms 
to answer queries, which removes the indexing phase. The experiment 
results show that our framework is competitive in time performance and can 
scale to hundreds of thousands of data graphs and graphs of thousands of 
vertices.


Date:			Monday, 10 February 2020

Time:			5:00pm - 7:00pm

Zoom Meeting:           https://hkust.zoom.us/j/396223450

Chairman:		Prof. Beifang CHEN (MATH)

Committee Members:	Prof. Qiong LUO (Supervisor)
 			Prof. Raymond WONG
 			Prof. Ke YI
 			Prof. Weichuan YU (ECE)
 			Prof. James CHENG (CUHK)


**** ALL are Welcome ****