Top Banner
MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013 Shanjiang Tang, Bu-Sung Lee, Bingsheng He 1
29

MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

Dec 13, 2015

Download

Documents

Victor Carroll
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads

School of Computer Engineering

Nanyang Technological University

30th Aug 2013

Shanjiang Tang, Bu-Sung Lee, Bingsheng He

1

Page 2: MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

OutLine

• Background & Motivations• MROrder• Evaluation• Conclusion

2

Page 3: MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

MapReduce Computation Model

Map Intermediate

Result

Intermediate

Result

Intermediate

Result

Intermediate

Result

Map

Map

Map

ReduceOutputResult

ReduceOutputResult

ReduceOutputResult

ReduceOutputResult

FinalResult

Map-Phase Computation

Reduce-Phase Computation

InputData

3

Page 4: MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

Hadoop Execution Model

• Hadoop is an open-source implementation of MapReduce Model.

• The cluster computation resources are divided into map slots and reduce slots, which are configured by Hadoop administrator in advance.

• A MapReduce job generally consists of map tasks and reduce tasks.

• Map tasks have to be allocated with map slots, and reduce tasks have to be allocated with reduce slots.

4

Page 5: MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

Hadoop Execution Model

5

Map slots Reduce slots

Map tasks start before reduce tasks

Map tasks can only run on map slots, reduce tasks can only run on reduce slots

Page 6: MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

Job Order VS Performance

Implication: Different Job orders have a significant impact on performance results!!!

Map Phase :

Reduce Phase :

Map Phase :

Reduce Phase :

1 2 3 4J J J J

4 3 2 1J J J J

( ).a

( ).b

1 2 341 2 3 4

43 2 14 3 21

6

time

time

Page 7: MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

Our Goals

• Job ordering Optimization is a non-trivial approach to improve the performance of MapReduce workloads ( i.e., a batch of MapReduce jobs).

• Our work focuses on job ordering optimization for online MapReduce workloads under FIFO scheduler, where jobs arriving over time.

• Different performance metrics are considered, e.g., makespan, total completion time.

7

Page 8: MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

OutLine

• Background & Motivations• MROrder• Evaluation• Conclusion

8

Page 9: MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

Architecture Overview of MROrder

9

Page 10: MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

Policy Module

• Determine when and how to perform job ordering optimization for MapReduce jobs.

• We provide two alternative solutions for determine when to perform job ordering optimization: PNJ-Dominated Solution.

performs job ordering when the number of jobs in the queue reaches to a threshold , i.e., .

TP-Dominated Solution.

invokes periodically after a time interval.

Notes: PNJ -- policy for the number of job. TP – time-based policy.

n0n 0n n

10

Page 11: MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

Policy Module

• TP-Dominated solution: TP-Dominated Solution with Fixed Time Interval (TP-FTI).

perform job ordering periodically within fixed time interval

TP-Dominated Solution with Adaptive Time Interval (TP-ATI).

perform job ordering dynamically with adaptive time interval, based on the estimated running time of workloads.

11

Page 12: MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

TP-FTI

12

Page 13: MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

TP-ATI

13

Page 14: MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

Ordering Engine

• Responsible for performing job ordering optimization.

• Two types of job ordering approaches: Simulation-based Ordering Approach (SIM).

we develop a Hadoop simulator Hsim to look for optimal results. It is a brute-force method.

Algorithm-based Ordering Approach (ALG).

we provide efficient heuristic job ordering algorithms for different performance metrics, e.g., makespan, total completion time.

14

Page 15: MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

ALG for Makespan

Page 16: MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

ALG for Total Completion Time

Page 17: MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

OutLine

• Background & Motivations• MROrder• Evaluation• Conclusion

17

Page 18: MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

Experiment Setup

• Enviroments

A Hadoop cluster consisting of 10 nodes, each with two Intel X5675 CPUs, 24GB memory and 56GB hard disks.

• Workloads

Synthetic Facebook Workload.

we generated it based on previously related work. Most of jobs are small-size, aiming to use it to evaluate the total completion time.

Tested Workload.

Most of its jobs are large-size, we use it to evaluate the makespan.

18

Page 19: MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

TP-FTI VS TP-ATI

TP-ATI is smarter and works better than TP-FTI !

19

Δt : the suitable threshold of time period for time-based policy.PITCT: performance improvement of total completion time.

Page 20: MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

ALG VS SIM

20

SIM performs better than ALG, but consumes more time especially when the number of jobs are large.

Page 21: MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

Performance Improvement by MROrder (Simulation Result)

21

Total Completion Time is sensitive to the small-size dominated jobs !

Page 22: MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

Performance Improvement by MROrder (Real Experiment Result)

22

Makespan is sensitive to the large-size dominated jobs !

Page 23: MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

OutLine

• Background & Motivations• MROrder• Evaluation• Conclusion

23

Page 24: MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

Conclusion

• Job ordering optimization is a non-trivial method to improve the efficiency of slots resource utilization and perform of MapReduce workloads.

• MROrder is a prototype system for online MapReduce workloads, being flexible for various performance metrics.

• Experimental results show that MROrder improves the performance of MapReduce workloads significantly.

• The source code of MROrder is available at:

http://sourceforge.net/projects/mrorder/

24

Page 25: MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

Ongoing and Future Work

• Integrating MROrder into Hadoop system.

• Considering the performance improvement for other schedulers, e.g., Hadoop Fair Scheduler, Capacity Scheduler.

• Exploring other alternative approaches to improve the cluster utilization and performance of MapReduce workloads.

25

Page 26: MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

Acknowledgement

• This work is supported by the ”User and Domain driven data analytics as a Service framework” project under the A*STAR Thematic Strategic Research Programme (SERC Grant No. 1021580034).

26

Page 27: MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

27

Page 28: MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

Accuracy Evaluation of HSim

28

Page 29: MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

Impact of Inaccuracy in Estimated Map/Reduce Tasks Time

29