Top Banner
Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China [email protected] , Dec 19 2012 @ ADC 2012
31

Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China [email protected]@gmail.com, Dec 19.

Jan 02, 2016

Download

Documents

Baldwin Beasley
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China fengliangcc@gmail.comfengliangcc@gmail.com, Dec 19.

Eneryg Efficiency for MapReduce Workloads: An Indepth Study

Boliang Feng

Renmin University of [email protected], Dec 19 2012

@ ADC 2012

Page 2: Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China fengliangcc@gmail.comfengliangcc@gmail.com, Dec 19.

Outline

Why Energy?

Factors Affecting Energy Efficiency of MapReduce

Experimental Design

Analysis of Result

Key Finding and Recommendations

Conclusion and Future Work

Page 3: Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China fengliangcc@gmail.comfengliangcc@gmail.com, Dec 19.

Why energy?

Cooling

Cost

Enviormental Effect

Perfomance

Page 4: Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China fengliangcc@gmail.comfengliangcc@gmail.com, Dec 19.

Data Center: some numbers

The data center in Dallas, Oregon: ~50 MW Average electricity consumption in USA: ~900kwh/month/family, or 1.25KW

Power consumption is the major cost and constraint of data center

About 7000 data centers in USA

In US the data centers accounted for roughly 61 billion kWh (1.5% of the total U.S. electricity consumption) in 2006 (EPA 2007) The number is expected to be doubled by 2011

Page 5: Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China fengliangcc@gmail.comfengliangcc@gmail.com, Dec 19.

Green Computing in Cloud

Physical construction & Chip level

system software Virtiual Datacenter OS IBM: Power-Aware Request Distribution

Cluster level view Dynamic Resource Configuration Workload distribution …

Green application DBMS, MapReduce

Industrial standard Green Grid, PUE(Power Usage Effectiveness) , DiCE

Page 6: Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China fengliangcc@gmail.comfengliangcc@gmail.com, Dec 19.

Why MapReduce?

MapReduce & Hadoop MapReduce popular, fashionable distributed processing model for

parallel computing in data centers. Hadoop is an open-source implementation of MapReduce

New Challenges Little attention in the design of MapReduce platforms Perform automatic parallelization and distribution of computations MapReduce incorporates mechanisms to be resilient to failures

Page 7: Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China fengliangcc@gmail.comfengliangcc@gmail.com, Dec 19.

Aim to Answer 2 Questions

We aim to address the following two questions:

Which factors affect the cluster-wise energy efficiency of a MapReduce platform?

Is there any opportunity to perform tradeoff?

Page 8: Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China fengliangcc@gmail.comfengliangcc@gmail.com, Dec 19.

Factors Affect Energy Efficiency Identify 4 factors that affect the energy efficiency of

MapReduce: CPU intensiveness, I/O intensiveness Factors of the underlying distributed file system replica

factor as well as the file block size

Questions 1

Page 9: Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China fengliangcc@gmail.comfengliangcc@gmail.com, Dec 19.

Is there any opportunity? Identify four typical workloads of MapReduce that present

different kinds of application scenarios TextWrite, WordCount, GrepSearch, Terasort

Measuring the energy consumption with varied disparate cluster scales and other related factors

Questions 2

Page 10: Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China fengliangcc@gmail.comfengliangcc@gmail.com, Dec 19.

Energy Consumption Model for MapReduce

CPU intensiveness, I/O intensiveness

Replica Factors of the underlying distributed file system as well as the file block size

Factors Affecting Energy Efficiency ofMapReduce

Page 11: Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China fengliangcc@gmail.comfengliangcc@gmail.com, Dec 19.

Metric Power, Time, Energy Energy Efficiency (EE)

Cluster Setup 2.4GHZ Intel Core Duo processor, 4GB RAM, 1000Mbps NetCard Hadoop-0.20.2

Experiment Design

Page 12: Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China fengliangcc@gmail.comfengliangcc@gmail.com, Dec 19.

Workloads: TextWrite: Writes a large unsorted random sequence of words from a

word-list. Network-intensive, map only job WordCount: Map-only CPU-intensive job. Matching regular expressions

from input files. High CPU utilization in map stage GrepSearch: Balance between CPU-intensive(map stage) and I/O

intensive jobs(reduce stage). High map/reduce ratio Terasort: Sorting the official input datasets. CPU bound in map stage

and I/O bound in reduce stage. Low map/reduce ratio

Experiment Design

Page 13: Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China fengliangcc@gmail.comfengliangcc@gmail.com, Dec 19.

Varied cluster parameters: Cluster size: 2~6 nodes Replica factor: 1~5 replicas Block size: 16MB~1GB Data size: 5~20GB

Experiment Design

Page 14: Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China fengliangcc@gmail.comfengliangcc@gmail.com, Dec 19.

Analysis of Results

We run this four workloads with varied workload size, cluster scale, replica factor and block size

Page 15: Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China fengliangcc@gmail.comfengliangcc@gmail.com, Dec 19.

TextWrite(1)

It has almost a linear growth of both latency and energy consumption with the replica factor increasing

Page 16: Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China fengliangcc@gmail.comfengliangcc@gmail.com, Dec 19.

TextWrite(2)

With more nodes: great improvement of performance, 51.3%. Energy decreased from 71Wh to 49Wh

More nodes means the increase of power. But the response time reduction can trade-off the energy consumption

Page 17: Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China fengliangcc@gmail.comfengliangcc@gmail.com, Dec 19.

TextWrite(3)

Small block size enables more tasks to be processed in parallel.

When larger than 64MB, the parallelism of the system is reduced, so that energy consumption increases significantly

Page 18: Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China fengliangcc@gmail.comfengliangcc@gmail.com, Dec 19.

GrepSearch(1)

Higher degree of replica factor means more choices for tasks assignment, improving the load balancing of the system

HDFS replica placement policy not only improve data reliability, availability, but also improve the parallelism of the GrepSearch workload

Page 19: Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China fengliangcc@gmail.comfengliangcc@gmail.com, Dec 19.

GrepSearch(2)

More nodes means more resources and better performance

When the workload size is as small, the initial cost can not be amortized, and resources are sufficient. Thus, there is no obvious energy saving with the increase of the cluster size

Page 20: Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China fengliangcc@gmail.comfengliangcc@gmail.com, Dec 19.

GrepSearch(3)

With workload size increasing, the value of E-E is reduced

Small block size means large overhead on job initialization

Well-tuned block size can obtain energy saving by as much as 36.8%

Page 21: Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China fengliangcc@gmail.comfengliangcc@gmail.com, Dec 19.

TeraSort(1)

Can not see significant changes of performance with varied replica factors

More replicas would improve the load balance of map tasks

But would be large data transfer in the shuffle stage, affecting the progress of the whole job

Page 22: Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China fengliangcc@gmail.comfengliangcc@gmail.com, Dec 19.

TeraSort(2)

With varied cluster size and workload size

Larger workload increase the job runtime, and more IT components will lead to more energy waste

Add more servers provide higher I/O throughput achieves energy consumption reduction by 20.2%

Page 23: Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China fengliangcc@gmail.comfengliangcc@gmail.com, Dec 19.

TeraSort(3)

Fig.3.c shows the value of E-E with varied cluster size and workload size

Small block size means fine-grained input splits which will improve the performance of reduce sort

Big block size means less data shuffle

Page 24: Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China fengliangcc@gmail.comfengliangcc@gmail.com, Dec 19.

WordCount(1)

It employs almost 95% of the potential CPU

Increasing the degree of replica factor improve performance, parallelism

Large replica factors implies more opportunities for load balance

Page 25: Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China fengliangcc@gmail.comfengliangcc@gmail.com, Dec 19.

WordCount(2)

With more nodes added in the cluster, both response time and energy consumption decrease by 56.8% and 18.2% with 20GB data set

Page 26: Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China fengliangcc@gmail.comfengliangcc@gmail.com, Dec 19.

WordCount(3)

Small block size brings out high cost for tasks initialization

A large block size such as 1GB will make negative effects on parallelism of the system

Page 27: Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China fengliangcc@gmail.comfengliangcc@gmail.com, Dec 19.

Key Findings

With well-tuned system parameters and adaptive resource configurations, MapReduce cluster can achieve both performance improvement and good energy saving simultaneously in some instances

That is surprisingly contrast to previous works on cluster-level energy conservation.

Page 28: Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China fengliangcc@gmail.comfengliangcc@gmail.com, Dec 19.

Recommendations

For CPU intensive and high map/reduce ratio workloads, appropriate number of servers should be provided based on the workload size, ensuring load balance and adequate CPU resource

For I/O intensive workloads, fine-grained input splits is effective for shuffle and reduce stages, which is more energy efficient on condition that the initialization cost can be amortized

Improved data partitioning algorithms in map stage and content-ware reduce tasks scheduling strategies are key areas for energy efficiency, where refinements and improvements are needed.

Page 29: Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China fengliangcc@gmail.comfengliangcc@gmail.com, Dec 19.

Conclusion

We identified four factors that affect the energy efficiency of MapReduce based on the cluster

We chose four typical workloads of MapReduce and measured the energy consumption with varied disparate cluster scales and related factors

MapReduce cluster can achieve both significant energy saving and performance improvement simultaneously in some instances

Page 30: Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China fengliangcc@gmail.comfengliangcc@gmail.com, Dec 19.

Future Work

Verify the results of this paper in a larger size of clusters

More benchmarks of MapReduce should be introduced

Investigate the effects of changing other parameters: parallel reduce copies, memory limit, and file buffer size

Page 31: Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China fengliangcc@gmail.comfengliangcc@gmail.com, Dec 19.

Thank you Q&A