More about HKUST
APPLICATION-AWARE NETWORKING FOR BIG DATA COMPUTING FRAMEWORK
MPhil Thesis Defence Title: "APPLICATION-AWARE NETWORKING FOR BIG DATA COMPUTING FRAMEWORK" By Mr. Yang PENG Abstract As the world enters the era of big data, MapReduce-like data-parallel computing frameworks are widely adopted in clouds and data centers. My objective is to build a more efficient underlying networks and improve the performance of such computing system. Specifically, I first present my effort towards comprehensive traffic forecasting for big data applications using external, light-weighted file system monitoring. The idea is motivated by the key observations that rich traffic demand information already exists in the log and meta-data files of many big data applications, and that such information can be readily extracted through run-time file system monitoring. As an initial step, we use Hadoop as a concrete example to explore our methodology and develop a system called HadoopWatch to predict traffic demand of Hadoop applications. Our experiments over a series of MapReduce applications demonstrate that HadoopWatch can forecast the traffic demand with almost 100% accuracy and time advance. Meanwhile, it makes no modification of the Hadoop framework, and introduces little overhead to the application performance. Second, I also make my attempt to orchestrate the network with the traffic demand information forecasted by HadoopWatch. After studying the traffic pattern widely existed in these computing frameworks, I realize that many flows are grouped to achieve a common barrier. To diminish the average group communication time, I proposed a task-aware flow scheduling and routing scheme. Flow scheduling is based on Shortest Remaining First scheduling paradigm, while task-aware flow routing can guarantee the grouped flows are evenly distributed across the multiple paths. I design and implement a prototype system called ShuffleBoost, which can cooperate with HadoopWatch and improve network efficiency in Hadoop clusters. In the best case I measured, the average group communication time decreased by 13.5%, and the average job completion time drops by 7.8%. Date: Tuesday, 3 June 2014 Time: 4:00pm - 6:00pm Venue: Room 3501 Lifts 25/26 Committee Members: Dr. Kai Chen (Supervisor) Dr. Lin Gu (Chairperson) Dr. Pan Hui **** ALL are Welcome ****