Top Banner
Where to Deploy Hadoop: Bare-metal or Cloud? Michael Wendt, Sewook Wee Data Insights R&D Group
27

Where to Deploy Hadoop: Bare Metal or Cloud?

Nov 28, 2014

Download

Technology

Hadoop_Summit

Deciding the deployment model is critical when enterprises adopt Hadoop. Initially, the bare metal (on-premise cluster with physical servers) model was popular to avoid I/O overhead in the virtualized environments. However, these days, cloud is also a contending option with its compelling cost savings, and ease of operation. To aid in assessing the deployment options, Accenture Technology Labs developed Accenture Data Platform Benchmark suite, a total cost of ownership (TCO) model and has tuned and compared performance of bare metal Hadoop clusters and Hadoop cloud service. Interestingly enough, the study discovered that price/performance ratio is not a critical factor in making a Hadoop deployment decision. Employing empirical and systemic analyses, the study resulted in comparable price/performance ratio from both bare metal Hadoop clusters and Hadoop-as-a-service. Moreover, cheaper purchasing options (e.g., long term contracts) provides better ratio than the bare metal one in many cases. Thus, this result debunks the idea that the cloud is not suitable to Hadoop MapReduce workloads due to their heavy I/O requirements. Furthermore, the study finds that the Hadoop default configuration provides ample headroom for performance tuning, and the cloud infrastructure enables even further performance tuning opportunities.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Where to Deploy Hadoop: Bare Metal or Cloud?

Where to Deploy Hadoop: Bare-metal or Cloud?

Michael Wendt, Sewook WeeData Insights R&D Group

Page 2: Where to Deploy Hadoop: Bare Metal or Cloud?

Copyright © 2013 Accenture All rights reserved. 2

Big Data: Bare-metal vs. Cloud

Bare-metal Cloud

On-premise full custom

Hadoop-as-a-Service

Hadoop Appliance

Hadoop Hosting

Page 3: Where to Deploy Hadoop: Bare Metal or Cloud?

Copyright © 2013 Accenture All rights reserved. 3

Big Data: Bare-metal vs. Cloud

Bare-metal Cloud

On-premise full custom

Hadoop-as-a-Service

Hadoop Appliance

Hadoop Hosting

Data Privacy Data GravityPrice-Performance

Ratio

Productivity of Developers & Data Scientists

Data Enrichment

Page 4: Where to Deploy Hadoop: Bare Metal or Cloud?

Copyright © 2013 Accenture All rights reserved. 4

Big Data: Bare-metal vs. Cloud

Bare-metal Cloud

On-premise full custom

Hadoop-as-a-Service

Hadoop Appliance

Hadoop Hosting

Data Privacy Data GravityPrice-Performance

Ratio

Productivity of Developers & Data Scientists

Data Enrichment

Page 5: Where to Deploy Hadoop: Bare Metal or Cloud?

Copyright © 2013 Accenture All rights reserved. 5Servers designed by Daniel Campos from The Noun Project

Price-Performance Ratio Views

Bare-metal Cloud

On-premise full custom

Hadoop-as-a-Service

Cloud? Virtualized? Slow!

Who cares! I’m cheap, just throw more in!

Price-Performance Ratio

Page 6: Where to Deploy Hadoop: Bare Metal or Cloud?

Copyright © 2013 Accenture All rights reserved. 6

Hadoop Deployment Comparison Study

Bare-metal Cloud

On-premise full custom

Hadoop-as-a-Service

Accenture Data Platform Benchmark

+TCO analysis

Price-Performance Ratio

Price-Performance Ratio

Page 7: Where to Deploy Hadoop: Bare Metal or Cloud?

Copyright © 2013 Accenture All rights reserved. 7

Hadoop Deployment Comparison StudyTCO Analysis

Price-Performance Ratio

Bare-metal Cloud

On-premise full custom

Hadoop-as-a-Service

Accenture Data Platform Benchmark

+TCO analysis

Page 8: Where to Deploy Hadoop: Bare Metal or Cloud?

Copyright © 2013 Accenture All rights reserved. 8

TCO of Bare-metal Hadoop Cluster

On-premise full custom

Server hardware

Staff for operation

Data center facility and electricity

Technical support

24 server nodes and 50 TB of HDFS capacity*

small-scale initial production deployment

$3,000.00 $2,914.58 $6,656.00 $9,274.46

$21,845.04

Servers designed by Daniel Campos from The Noun Project

Page 9: Where to Deploy Hadoop: Bare Metal or Cloud?

Copyright © 2013 Accenture All rights reserved. 9

TCO of Hadoop-as-a-Service

Hadoop-as-a-Service

Hadoop service

Staff for operation

Storage services

Technical support

Used bare-metal TCO for budget

Calculated the number of affordable instances

$15,318.28 $2,063.00 $1,372.27 $3,091.49

$21,845.04

Page 10: Where to Deploy Hadoop: Bare Metal or Cloud?

Copyright © 2013 Accenture All rights reserved. 10

TCO of Hadoop-as-a-Service – Instances

Hadoop service

14 instance types

3 pricing models

42 combinations

Hadoop-as-a-Service

Page 11: Where to Deploy Hadoop: Bare Metal or Cloud?

Copyright © 2013 Accenture All rights reserved. 11

TCO of Hadoop-as-a-Service – Instances

Hadoop service

m1.xl

m2.4xl

cc2.8xl

Selected representative 3 instance types:m1.xlarge, m2.4xlarge, cc2.8xlarge

Hadoop-as-a-Service

Page 12: Where to Deploy Hadoop: Bare Metal or Cloud?

Copyright © 2013 Accenture All rights reserved. 12

TCO of Hadoop-as-a-Service – Affordable Instances

Hadoop service

50% cluster utilization assumed

1/3 of budget allocated for Spot

instances

Instance type

On-demand instances

(ODI)

Reserved instances

(RI)

Reserved + Spot instances

(RI + SI)

m1.xlarge 68 112 192

m2.4xlarge 20 41 77

cc2.8xlarge 13 28 53$15,318.28

Hadoop-as-a-Service

Page 13: Where to Deploy Hadoop: Bare Metal or Cloud?

Copyright © 2013 Accenture All rights reserved. 13

Hadoop Deployment Comparison StudyAccenture Data Platform Benchmark

Price-Performance Ratio

Bare-metal Cloud

On-premise full custom

Hadoop-as-a-Service

+TCO analysis

Accenture Data Platform Benchmark

Page 14: Where to Deploy Hadoop: Bare Metal or Cloud?

Copyright © 2013 Accenture All rights reserved. 14

Accenture Data Platform Benchmark

Log management Sessionization

Customer preference prediction Recommendation engine

Text Analytics Document clustering

Use cases Workload

Suite of real-world Hadoop MapReduce applications

From client experience, internal roadmap, public

literature

Open-source

libraries & public

datasets

Categorized & selected common

use cases

Page 15: Where to Deploy Hadoop: Bare Metal or Cloud?

Copyright © 2013 Accenture All rights reserved. 15

Accenture Data Platform Benchmark:Sessionization

Log data

Sessions

Log data

BucketingSortingSlicing

Log data

A session is a sequence of related interactions, useful to

analyze as a group

~150 billion log entries,

~24 TB

1 million users,

1.1 billion sessions

Page 16: Where to Deploy Hadoop: Bare Metal or Cloud?

Copyright © 2013 Accenture All rights reserved. 16

Accenture Data Platform Benchmark:Recommendation Engine

Ratings data Who rated what item?

Co-occurrence matrixHow many people rated the pair of

items?

RecommendationGiven the way the person rated

these items, he/she is likely to be interested in these other items.

Used item-based collaborative filtering algorithm

Mahout example library used as foundation

Generated 300 million

ratings

3 million population,

50,000 items

Page 17: Where to Deploy Hadoop: Bare Metal or Cloud?

Copyright © 2013 Accenture All rights reserved. 17

Accenture Data Platform Benchmark:Document Clustering

Corpus of crawled web pages

Filtered and tokenized documents

Term dictionary

TF vectors

Clustered documents

K-means

TF-IDF vectors

Groups similar documents

Application components used in many areas (e.g., search engines, e-commerce site

optimization)

CommonCrawl

dataset, 10 TB corpus*

~31,000 ARC files or ~300 million HTML pages

Page 18: Where to Deploy Hadoop: Bare Metal or Cloud?

Copyright © 2013 Accenture All rights reserved. 18

TCO analysis

Hadoop Deployment Comparison StudyExperiment Setup/Results

Bare-metal Cloud

+

On-premise full custom

Hadoop-as-a-Service

Accenture Data Platform Benchmark

Price-Performance Ratio

Page 19: Where to Deploy Hadoop: Bare Metal or Cloud?

Copyright © 2013 Accenture All rights reserved. 19

Experiment Setup: Price-Performance Ratio Comparison

Bare-metalHadoopCluster

Amazon EMR

Clusters

1 bare-metal cluster vs. 9

Amazon EMR clusters

Manual and automated

tuning

Fixed budget for cluster size

Measure execution

time of benchmark

Price-Performance Ratio

Page 20: Where to Deploy Hadoop: Bare Metal or Cloud?

Copyright © 2013 Accenture All rights reserved. 20

Optimize phase

Profile phase

Experiment Setup:Starfish Automated Performance Tuning Tool

Starfish (now Unravel) is an automated performance tuning

tool for MapReduce jobs

Speedometer designed by Filippo Camedda from The Noun Project

For the experiment we ran each benchmark twice using Starfish

Manual and automated

tuning

Measure execution

time of optimize phase

Page 21: Where to Deploy Hadoop: Bare Metal or Cloud?

Copyright © 2013 Accenture All rights reserved. 21

Experiment Results:Starfish Automated Performance Tuning Tool

Manual and automated

tuning

Starfish tuned Recommendation Engine workload w/ 11 cascaded

MapReduce jobs

Manually tuned Sessionization workload

2+ weeks of manual

tuning, ½ - 1 day

iterations

8x improvement in one tuning

cycle

Achieve performance

increases with less cost using Starfish

Page 22: Where to Deploy Hadoop: Bare Metal or Cloud?

Copyright © 2013 Accenture All rights reserved. 22

ODI RI RI+SI

408.07

229.25

125.82

381.55

204.10

166.82

250.13

172.23

114.35

cc2.8xlarge

m2.4xlarge

m1.xlarge

Amazon EMR Configuration

Ex

ec

uti

on

Tim

e (

min

ute

s)

Experiment Results:Sessionization

Bare-metal: 533

13 20 68 28 41 112 53 77 192

Page 23: Where to Deploy Hadoop: Bare Metal or Cloud?

Copyright © 2013 Accenture All rights reserved. 23

ODI RI RI+SI

23.33

21.97

18.48

20.13

19.97

16.92

14.28

16.30

15.08

cc2.8xlarge

m2.4xlarge

m1.xlarge

Amazon EMR Configuration

Ex

ec

uti

on

Tim

e (

min

ute

s)

Experiment Results:Recommendation Engine

Bare-metal: 21.59

13 20 68 28 41 112 53 77 192

Page 24: Where to Deploy Hadoop: Bare Metal or Cloud?

Copyright © 2013 Accenture All rights reserved. 24

ODI RI RI+SI

1661.03

1157.37

784.82

1649.98

1112.68

629.98

914.35

779.98

742.38

cc2.8xlarge

m2.4xlarge

m1.xlarge

Amazon EMR Configuration

Ex

ec

uti

on

Tim

e (

min

ute

s)

Experiment Results:Document Clustering

Bare-metal: 1186.37

13 20 68 28 41 112 53 77 192

Page 25: Where to Deploy Hadoop: Bare Metal or Cloud?

Copyright © 2013 Accenture All rights reserved. 25

Key Takeaways

Hadoop-as-a-Service offers a better price-performance ratio

Cloud expands the performance tuning

opportunities

Automated performance tuning tools are a

necessity

Servers designed by Daniel Campos from The Noun Project

Page 26: Where to Deploy Hadoop: Bare Metal or Cloud?

Copyright © 2013 Accenture All rights reserved. 26

Acknowledgement

Page 27: Where to Deploy Hadoop: Bare Metal or Cloud?

Copyright © 2013 Accenture All rights reserved. 27

More details

Contact us for the full white paper: Hadoop Deployment Comparison Study

Michael Wendt

R&D Developer

Data Insights R&D

Accenture Technology Labs

(408) 817-2190

[email protected]

Scott Kurth

Group Lead

Data Insights R&D

Accenture Technology Labs

(408) 817-2775

[email protected]