DBSCAN Revisited: Mis-Claim, Un-Fixability, and Approximation

Speaker:        Professor Yufei TAO
                Department of Computer Science and Engineering
                Chinese University of Hong Kong

Title:          "DBSCAN Revisited: Mis-Claim, Un-Fixability, and
                 Approximation"

Date:           Monday, 2 November 2015

Time:           4:00pm - 5:00pm

Venue:          Lecture Theater F (near lifts 25/26), HKUST

Abstract:

DBSCAN is a popular method for clustering multi-dimensional objects. Just
as notable as the method's vast success is the research community's quest
for its efficient computation. The original KDD'96 paper claimed an
algorithm with O(n log n) running time, where n is the number of objects.
Unfortunately, this is a mis-claim; and that algorithm actually requires
O(n^2) time. There has been a fix in 2D space, where a genuine O(n log
n)-time algorithm has been found. Looking for a fix for dimensionality d
>= 3 is currently an important open problem.

In this talk, we will show that for d >=3, the DBSCAN problem requires
Omega(n^{4/3}) time to solve, unless very significant breakthroughs --
ones widely believed to be impossible -- could be made in theoretical
computer science. This (i) explains why the community's search for fixing
the aforementioned mis-claim has been futile for d >= 3, and (ii)
indicates (sadly) that all DBSCAN algorithms must be intolerably slow even
on moderately large n in practice. We will also show that the running time
can be dramatically brought down to O(n) in expectation regardless of the
dimensionality d, as soon as slight inaccuracy in the clustering results
is permitted.

**********************
Biography:

Yufei TAO is a full professor in the Department of Computer Science and
Engineering, Chinese University of Hong Kong. He served as an associate
editor of ACM Transactions on Database Systems (TODS) from 2008 to 2015,
and of IEEE Transactions on Knowledge and Data Engineering (TKDE) from
2012 to 2014. He served as a PC co-chair of International Conference on
Data Engineering (ICDE) 2014, and of International Symposium on Spatial
and Temporal Databases (SSTD) 2011. He received the best paper award at
SIGMOD 2013 and also at SIGMOD 2015. He also received a Hong Kong Young
Scientist Award in 2002.