More about HKUST
ProDA: An End-to-End Wavelet-Based OLAP System for Massive Datasets
Speaker: Professor Cyrus SHAHABI University of Southern California Title: "ProDA: An End-to-End Wavelet-Based OLAP System for Massive Datasets" Date: Friday, 11 July 2008 Time: 11:00am - 12 noon Venue: Room 3416 (via lifts 17/18), HKUST Abstract: Recent advancements in sensing and data acquisition technologies have enabled collection of massive datasets that represent complex real-world events and entities in fine detail. In light of access to such data-sets, scientists and system analysts are no longer restricted to modeling and simulation when analyzing real-world events. Instead, the preferred viable approach derives observations and verifies hypotheses by analytical exploration of representative real datasets that capture the corresponding event. This approach demands intelligent data storage, access, and analytical querying solutions and tools that facilitate convenient, efficient, and effective exploration of these massive datasets. Wavelet Transform has emerged as an elegant tool for online analytical queries. Most of the methods using wavelets, however, share the disadvantage of providing only data-dependant approximate answers by compressing the data. On the contrary, we developed an end-to-end system, termed ProDA (for Progressive Data Analysis) that does not rely on compressing the data. Instead, ProDA employs wavelet transformation to compact incoming queries rather than the underlying data. The intuition here is that queries are well-formed with repetitive patterns that can be exploited by wavelets for a more effective compression, leading to efficient query performance. ProDA employs wavelets to support exact, approximate, and progressive OLAP (On-Line Analytical Processing) queries on large multidimensional datasets, while keeping update costs relatively low. ProDA not only supports online execution of ad hoc analytical queries on massive datasets, but also extends the set of supported analytical queries to include the entire family of polynomial aggregate queries as well as the new class of group-by queries. We have verified the effectiveness of ProDA in practice by conducting extensive sets of experiments with several real-world datasets from NASA and Chevron. The details of PRODA project can be found in: http://infolab.usc.edu/News/PDF_FINAL_PRODA.pdf ***************** Biography: Cyrus SHAHABI is currently an Associate Professor and the Director of the Information Laboratory (InfoLAB) at the Computer Science Department and also a Research Area Director at the NSF's Integrated Media Systems Center (IMSC) at the University of Southern California. He received his B.S. in Computer Engineering from Sharif University of Technology in 1989 and then his M.S. and Ph.D. degrees in Computer Science from the University of Southern California in May 1993 and August 1996, respectively. He has two books and more than hundred articles, book chapters, and conference papers in the areas of databases, GIS and multimedia. Dr. SHAHABI's current research interests include Geospatial and Multidimensional Data Analysis, Peer-to-Peer Systems and Streaming Architectures. He is currently an associate editor of the IEEE Transactions on Parallel and Distributed Systems (TPDS) and on the editorial board of ACM Computers in Entertainment magazine. He is also a member of the steering committees of IEEE NetDB and the general co-chair of ACM GIS 2008. He serves on much conference program committees such as VLDB 2008, ACM SIGKDD 2006 to 2008, IEEE ICDE 2006 and 2008, SSTD 2005 and ACM SIGMOD 2004. Dr. SHAHABI is the recipient of the 2002 National Science Foundation CAREER Award and 2003 Presidential Early Career Awards for Scientists and Engineers (PECASE). In 2001, he also received an award from the Okawa Foundations.