A Survey on Approaches for Large-Scale Dataset Analysis

PhD Qualifying Examination


Title: "A Survey on Approaches for Large-Scale Dataset Analysis"

by

Miss Wuman LUO


Abstract:

Industrical and scientific datasets have been growing enormously in size 
and complexity in recent years. Many science and industrial users already 
or will soon manage petabytes of data. An important topic addressed by 
both these communities over the last several years is the large-scale 
dataset analysis in a shared-nothing architecture on large clusters of 
commodity hardware. The foremost requirements of large-scale dataset 
analysis are scalability sustaining performance, flexibility and high 
availability. This paper surveys the main approaches for large-scale 
dataset analysis: parallel RDBMSs, MapReduce, and special scientific 
databases. We first make a comparative study of parallel RDBMSs, 
MapReduce, and their hybrid approaches. We then discuss the approaches in 
scientific data analysis, highlighting the special requirements in this 
field and the corresponding solutions. Some research problems and 
challenges are also pointed out in our future work.


Date:			Thurday, 25 February 2010

Time:			3:00pm - 5:00pm

Venue:			Room 4480
			lifts 25/26

Committee Members:	Prof. Lionel Ni (Supervisor)
			Dr. Qiong Luo
			Dr. Lei Chen
			Dr. Qian Zhang


**** ALL are Welcome ****