Moving hadoop into the cloud with flexible slot management and speculative execution

Page 1

Moving Hadoop into the Cloud with Flexible Slot Management and Speculative Execution

Abstract: Load imbalance is a major source of overhead in parallel programs such as MapReduce.. Due to the uneven distribution of input data, tasks with more data become stragglers and delay the overall job completion. Running Hadoop in a private cloud opens up opportunities for expediting stragglers with more resources but also introduces problems that often outweigh the performance gain: (1) performance interference from co co-running running jobs may create new stragglers; (2) there exists a semantic gap between the Hadoop task management and resource pool-based based virtual cluster management preventing tasks from f using resources efficiently. In this paper, we strive to make Hadoop more resilient to data skew and more efficient in cloud environments. We present FlexSlot, a usertransparent task slot management scheme that automatically identifies map stragglers and nd resizes their slots accordingly to accelerate task execution. FlexSlot adaptively changes the number of slots on each virtual node to balance the resource usage so that the pool of resources can be efficiently utilized. FlexSlot further improves mitigation ion of data skew with an adaptive speculative execution strategy. Experimental results show that FlexSlot effectively reduces job completion time up to 47.2 percent compared to stock Hadoop and two recently proposed skew mitigation and speculative executio execution approaches.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.