ProDA: An End-to-End Wavelet-Based OLAP System for Massive Datasets

Speaker:	Professor Cyrus SHAHABI
		University of Southern California

Title:		"ProDA: An End-to-End Wavelet-Based OLAP System
		 for Massive Datasets"

Date:		Friday, 11 July 2008

Time:		11:00am - 12 noon

Venue:		Room 3416 (via lifts 17/18), HKUST

Abstract:

Recent advancements in sensing and data acquisition technologies have
enabled collection of massive datasets that represent complex real-world
events and entities in fine detail. In light of access to such data-sets,
scientists and system analysts are no longer restricted to modeling and
simulation when analyzing real-world events. Instead, the preferred viable
approach derives observations and verifies hypotheses by analytical
exploration of representative real datasets that capture the corresponding
event. This approach demands intelligent data storage, access, and
analytical querying solutions and tools that facilitate convenient,
efficient, and effective exploration of these massive datasets.

Wavelet Transform has emerged as an elegant tool for online analytical
queries. Most of the methods using wavelets, however, share the
disadvantage of providing only data-dependant approximate answers by
compressing the data. On the contrary, we developed an end-to-end system,
termed ProDA (for Progressive Data Analysis) that does not rely on
compressing the data. Instead, ProDA employs wavelet transformation to
compact incoming queries rather than the underlying data. The intuition
here is that queries are well-formed with repetitive patterns that can be
exploited by wavelets for a more effective compression, leading to
efficient query performance.  ProDA employs wavelets to support exact,
approximate, and progressive OLAP (On-Line Analytical Processing) queries
on large multidimensional datasets, while keeping update costs relatively
low. ProDA not only supports online execution of ad hoc analytical queries
on massive datasets, but also extends the set of supported analytical
queries to include the entire family of polynomial aggregate queries as
well as the new class of group-by queries.  We have verified the
effectiveness of ProDA in practice by conducting extensive sets of
experiments with several real-world datasets from NASA and Chevron.

The details of PRODA project can be found in:
http://infolab.usc.edu/News/PDF_FINAL_PRODA.pdf


*****************
Biography:

Cyrus SHAHABI is currently an Associate Professor and the Director of the
Information Laboratory (InfoLAB) at the Computer Science Department and
also a Research Area Director at the NSF's Integrated Media Systems Center
(IMSC) at the University of Southern California. He received his B.S. in
Computer Engineering from Sharif University of Technology in 1989 and then
his M.S. and Ph.D. degrees in Computer Science from the University of
Southern California in May 1993 and August 1996, respectively. He has two
books and more than hundred articles, book chapters, and conference papers
in the areas of databases, GIS and multimedia. Dr. SHAHABI's current
research interests include Geospatial and Multidimensional Data Analysis,
Peer-to-Peer Systems and Streaming Architectures. He is currently an
associate editor of the IEEE Transactions on Parallel and Distributed
Systems (TPDS) and on the editorial board of ACM Computers in
Entertainment magazine. He is also a member of the steering committees of
IEEE NetDB and the general co-chair of ACM GIS 2008. He serves on much
conference program committees such as VLDB 2008, ACM SIGKDD 2006 to 2008,
IEEE ICDE 2006 and 2008, SSTD 2005 and ACM SIGMOD 2004.  Dr. SHAHABI is
the recipient of the 2002 National Science Foundation CAREER Award and
2003 Presidential Early Career Awards for Scientists and Engineers
(PECASE). In 2001, he also received an award from the Okawa Foundations.