APPLICATION-AWARE NETWORKING FOR BIG DATA COMPUTING FRAMEWORK

MPhil Thesis Defence


Title: "APPLICATION-AWARE NETWORKING FOR BIG DATA COMPUTING FRAMEWORK"

By

Mr. Yang PENG


Abstract

As the world enters the era of big data, MapReduce-like data-parallel computing 
frameworks are widely adopted in clouds and data centers. My objective is to 
build a more efficient underlying networks and improve the performance of such 
computing system. Specifically, I first present my effort towards comprehensive 
traffic forecasting for big data applications using external, light-weighted 
file system monitoring. The idea is motivated by the key observations that rich 
traffic demand information already exists in the log and meta-data files of 
many big data applications, and that such information can be readily extracted 
through run-time file system monitoring. As an initial step, we use Hadoop as a 
concrete example to explore our methodology and develop a system called 
HadoopWatch to predict traffic demand of Hadoop applications. Our experiments 
over a series of MapReduce applications demonstrate that HadoopWatch can 
forecast the traffic demand with almost 100% accuracy and time advance. 
Meanwhile, it makes no modification of the Hadoop framework, and introduces 
little overhead to the application performance.

Second, I also make my attempt to orchestrate the network with the traffic 
demand information forecasted by HadoopWatch. After studying the traffic 
pattern widely existed in these computing frameworks, I realize that many flows 
are grouped to achieve a common barrier. To diminish the average group 
communication time, I proposed a task-aware flow scheduling and routing scheme. 
Flow scheduling is based on Shortest Remaining First scheduling paradigm, while 
task-aware flow routing can guarantee the grouped flows are evenly distributed 
across the multiple paths. I design and implement a prototype system called 
ShuffleBoost, which can cooperate with HadoopWatch and improve network 
efficiency in Hadoop clusters. In the best case I measured, the average group 
communication time decreased by 13.5%, and the average job completion time 
drops by 7.8%.


Date:			Tuesday, 3 June 2014

Time:			4:00pm - 6:00pm

Venue:			Room 3501
 			Lifts 25/26

Committee Members:	Dr. Kai Chen (Supervisor)
 			Dr. Lin Gu (Chairperson)
 			Dr. Pan Hui


**** ALL are Welcome ****