More about HKUST
Cross-Column Redundancy: Concept, Detection and Application
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Cross-Column Redundancy: Concept, Detection and Application" By Mr. Hao LIU Abstract Nowadays, more and more data from heterogeneous data sources are integrated into various data warehouse systems for analytical purposes. As a result, data columns in such systems often exhibit redundancy, which we term Cross-Column Redundancy (CCR). CCR indicates high similarity or correlation between columns and therefore can be exploited for data management and business intelligence. However, due to the combinatoric nature of CCR, it is computationally challenging to automatically detect CCR. In this thesis, we define three kinds of CCR, develop efficient algorithms for CCR detection, and leverage CCR to compress data. In particular, we focus on a kind of CCR, called Soft Concatenation Mapping (SCM), where one column can be derived from another or several other columns by transformation and concatenation. We prove that SCM detection is NP-hard and propose approximate algorithms. Furthermore, we leverage CCR for database compression and develop Cuttle, a column storage system that integrates our cross-column compression schemes into existing database systems transparently. Our experiments on real-world datasets show that Cuttle reduces the data storage by half and improves the query processing performance by 20%. In addition, we present the design and implementation of UStore, a customized version of Cuttle tailored for UnionPay. We use UnionPay?s inter-bank transaction settlement platform (ITSP) as a running example to illustrate the core components of UStore. To date, UStore has been deployed to process over 15 years? bankcard transaction data (over 3PB in plain text format) in UnionPay. Date: Monday, 24 July 2017 Time: 5:00pm - 7:00pm Venue: Room 2130B Lifts 19 Chairman: Prof. Guanghao Chen (CIVL) Committee Members: Prof. Lionel Ni (Supervisor) Prof. Qiong Luo (Supervisor) Prof. Shing-Chi Cheung Prof. Lei Chen Prof. Jingshen Wu (MAE) Prof. Qing Li (Comp. Sci., CityU) **** ALL are Welcome ****