More about HKUST
DBSCAN Revisited: Mis-Claim, Un-Fixability, and Approximation
Speaker: Professor Yufei TAO Department of Computer Science and Engineering Chinese University of Hong Kong Title: "DBSCAN Revisited: Mis-Claim, Un-Fixability, and Approximation" Date: Monday, 2 November 2015 Time: 4:00pm - 5:00pm Venue: Lecture Theater F (near lifts 25/26), HKUST Abstract: DBSCAN is a popular method for clustering multi-dimensional objects. Just as notable as the method's vast success is the research community's quest for its efficient computation. The original KDD'96 paper claimed an algorithm with O(n log n) running time, where n is the number of objects. Unfortunately, this is a mis-claim; and that algorithm actually requires O(n^2) time. There has been a fix in 2D space, where a genuine O(n log n)-time algorithm has been found. Looking for a fix for dimensionality d >= 3 is currently an important open problem. In this talk, we will show that for d >=3, the DBSCAN problem requires Omega(n^{4/3}) time to solve, unless very significant breakthroughs -- ones widely believed to be impossible -- could be made in theoretical computer science. This (i) explains why the community's search for fixing the aforementioned mis-claim has been futile for d >= 3, and (ii) indicates (sadly) that all DBSCAN algorithms must be intolerably slow even on moderately large n in practice. We will also show that the running time can be dramatically brought down to O(n) in expectation regardless of the dimensionality d, as soon as slight inaccuracy in the clustering results is permitted. ********************** Biography: Yufei TAO is a full professor in the Department of Computer Science and Engineering, Chinese University of Hong Kong. He served as an associate editor of ACM Transactions on Database Systems (TODS) from 2008 to 2015, and of IEEE Transactions on Knowledge and Data Engineering (TKDE) from 2012 to 2014. He served as a PC co-chair of International Conference on Data Engineering (ICDE) 2014, and of International Symposium on Spatial and Temporal Databases (SSTD) 2011. He received the best paper award at SIGMOD 2013 and also at SIGMOD 2015. He also received a Hong Kong Young Scientist Award in 2002.