Spatio-textual Data Analysis via Co-Location Mining and Collective Spatial Keyword Queries

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Spatio-textual Data Analysis via Co-Location Mining and Collective 
Spatial Keyword Queries"

By

Mr. Kai Ho CHAN


Abstract

With the proliferation of geo-positioning and geo-tagging techniques, 
spatio-textual data that possess both a geographical location and a textual 
description are gaining in prevalence. This development gives prominence to 
spatio-textual data analysis, which is an emerging research field and has both 
real-world and scientific applications. The research on spatial data analysis 
consists of many different areas, such as spatial data mining (i.e., knowledge 
discovery in large spatial databases) and spatial keyword query processing. In 
the area of spatial data mining, we want to discover interesting, and 
previously unknown but potentially useful, patterns from large spatial 
databases. For example, one type of spatial data mining is the spatial 
association mining, which finds the patterns and rules that describe the 
implication of one or a set of features from another set of features in spatial 
databases. In the area of spatial keyword query processing, we want to process 
the query and return relevant objects as results. A typical query takes a 
location and a set of keywords as arguments and returns the single 
spatio-textual object that best matches the keywords and is close to the 
specified location.

In this thesis, we introduce co-location pattern mining which is one type of 
spatial data mining and collective spatial keyword query (CoSKQ) which is one 
type of spatial keyword queries. For the co-location pattern mining problem, we 
develop a new support measure called Fraction-Score that overcome the 
weaknesses of the existing support measures for defining co-location patterns. 
To solve the problem based on Fraction-Score, we develop efficient algorithms 
which are significantly faster than a baseline that adapts the 
state-of-the-art.

For the CoSKQ problem, we consider two directions. First, we design a unified 
cost function which generalizes the majority of existing cost functions for 
CoSKQ and develop a unified approach which works as well as (and sometimes 
better than) best-known approaches based on different cost functions. Second, 
we propose a new cost function called the maximum dot size cost which captures 
both the distances among objects in a set and a query as existing cost 
functions do and the inherent costs of the objects. We present an exact 
algorithm and an approximate algorithm with a provable approximation bound for 
the problem. We conducted extensive experiments con- ducted on both real 
datasets and synthetic datasets, which verified all our proposed approaches and 
algorithms.


Date:			Monday, 22 July 2019

Time:			3:00pm - 5:00pm

Venue:			Room 2463
 			Lifts 25/26

Chairman:		Prof. Andrew Cohen (PHYS)

Committee Members:	Prof. Raymond Wong (Supervisor)
 			Prof. Dik-Lun Lee
 			Prof. Dit-Yan Yeung
 			Prof. Xueqing Zhang (CIVL)
 			Prof. Guoliang Li (Tsinghua Univ.)


**** ALL are Welcome ****