More about HKUST
Spatio-textual Data Analysis via Co-Location Mining and Collective Spatial Keyword Queries
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Spatio-textual Data Analysis via Co-Location Mining and Collective Spatial Keyword Queries" By Mr. Kai Ho CHAN Abstract With the proliferation of geo-positioning and geo-tagging techniques, spatio-textual data that possess both a geographical location and a textual description are gaining in prevalence. This development gives prominence to spatio-textual data analysis, which is an emerging research field and has both real-world and scientific applications. The research on spatial data analysis consists of many different areas, such as spatial data mining (i.e., knowledge discovery in large spatial databases) and spatial keyword query processing. In the area of spatial data mining, we want to discover interesting, and previously unknown but potentially useful, patterns from large spatial databases. For example, one type of spatial data mining is the spatial association mining, which finds the patterns and rules that describe the implication of one or a set of features from another set of features in spatial databases. In the area of spatial keyword query processing, we want to process the query and return relevant objects as results. A typical query takes a location and a set of keywords as arguments and returns the single spatio-textual object that best matches the keywords and is close to the specified location. In this thesis, we introduce co-location pattern mining which is one type of spatial data mining and collective spatial keyword query (CoSKQ) which is one type of spatial keyword queries. For the co-location pattern mining problem, we develop a new support measure called Fraction-Score that overcome the weaknesses of the existing support measures for defining co-location patterns. To solve the problem based on Fraction-Score, we develop efficient algorithms which are significantly faster than a baseline that adapts the state-of-the-art. For the CoSKQ problem, we consider two directions. First, we design a unified cost function which generalizes the majority of existing cost functions for CoSKQ and develop a unified approach which works as well as (and sometimes better than) best-known approaches based on different cost functions. Second, we propose a new cost function called the maximum dot size cost which captures both the distances among objects in a set and a query as existing cost functions do and the inherent costs of the objects. We present an exact algorithm and an approximate algorithm with a provable approximation bound for the problem. We conducted extensive experiments con- ducted on both real datasets and synthetic datasets, which verified all our proposed approaches and algorithms. Date: Monday, 22 July 2019 Time: 3:00pm - 5:00pm Venue: Room 2463 Lifts 25/26 Chairman: Prof. Andrew Cohen (PHYS) Committee Members: Prof. Raymond Wong (Supervisor) Prof. Dik-Lun Lee Prof. Dit-Yan Yeung Prof. Xueqing Zhang (CIVL) Prof. Guoliang Li (Tsinghua Univ.) **** ALL are Welcome ****