More about HKUST
Locality Sensitive Hashing for Big Data
Speaker: Dr. Wei WANG University of New South Wales Australia Title: "Locality Sensitive Hashing for Big Data" Date: Monday, 27 October, 2014 Time: 4:00pm - 5:00pm Venue: Lecture Theater F (near lifts 25/26) HKUST Abstract: Finding (approximate) nearest neighbor is a fundamental problem in computational geometry, and plays an important role in many database, multimedia, and machine learning problems, as real-world objects are usually modelled or represented as one or several high-dimensional feature vectors. The problem is notoriously hard due to the phenomenon named "curse of dimensionality", and it becomes even more challenging in the big data era, calling for new ideas and methodologies. While there are numerous heuristic methods to solve this problem, methods based on Locality Sensitive Hashing (LSH) are arguable the most popular and promising approach as they have both rigorous theoretical guarantees on space and time complexities and excellent query performance in practice. However, to achieve such performance, LSH-based methods require huge amount of index. Recent work reduces the index size from O((nd)^{1.5}) to O(n log(n)) (n is the number of data points and d is the dimensionality), but the index is still typically an order of magnitude larger than the size of the data. In this talk, we present our recent breakthrough in this problem. We propose a novel method to solve the problem with theoretical guarantees and with a tiny index which is linear in n. Furthermore, our method supports a variety of functionalities, such as finding the exact nearest neighbors with guaranteed probabilities. In the experiment, our methods demonstrate superior performance against the state-of-the-art LSH-based methods, and scale up well to a billion 128-dimensional points on a single commodity PC. ******************** Biography: Dr. Wei Wang is an Associate Professor in the School of Computer Science and Engineering, The University of New South Wales, Australia. His current research interests include keyword search on (semi-)structured data, similarity query processing, high dimensional indexing, and spatial databases. He has published over ninety research papers, with many in premier database journals (TODS, VLDB J, and TKDE) and conferences (SIGMOD, VLDB, ICDE, and WWW). He is currently serving on the editorial boards of IEEE Transactions on Knowledge and Data Engineering (TKDE).