Top Banner
© 2009 VMware Inc. All rights reserved Is Your Cloud Ready for Big Data? Richard McDougall CTO, Storage and Application Services
35

Is your cloud ready for Big Data? Strata NY 2013

Nov 10, 2014

Download

Technology

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Is your cloud ready for Big Data? Strata NY 2013

© 2009 VMware Inc. All rights reserved

Is Your Cloud Ready for Big Data?

Richard McDougall

CTO, Storage and Application Services

Page 2: Is your cloud ready for Big Data? Strata NY 2013

2

Not Just for the Web Giants – The Intelligent Enterprise

Page 3: Is your cloud ready for Big Data? Strata NY 2013

3

Real-time analysis allows instant understanding of

market dynamics.

Retailers can have intimate understanding of their

customers needs and use direct targeted marketing.

Market Segment Analysis ! Personalized Customer Targeting`

Page 4: Is your cloud ready for Big Data? Strata NY 2013

4

The Emerging Pattern of Big Data Systems: Retail Example

Real-Time Streams

Exa-scale Data Store

Parallel Data Processing

Real-Time Processing

Machine Learning

Data Science

Cloud Infrastructure

Page 5: Is your cloud ready for Big Data? Strata NY 2013

5

Storage: Plan for Peta-scale Data Storage and Processing

0.01

0.1

1

10

100

1000

2000 2003 2006 2009 2012 2015

Online Apps Analytics

PB of Data

Analytics Rapidly Outgrows Traditional Data Size by 100x

Page 6: Is your cloud ready for Big Data? Strata NY 2013

6

Unprecedented Scale

“Data transparency, amplified by Social Networks

generates data at a scale never seen before”

- The Human Face of Big Data

We are creating an Exabyte of data every minute in 2013

Yottabyte by 2030

Page 7: Is your cloud ready for Big Data? Strata NY 2013

7

A single GE Jet Engine produces 10 Terabytes of data in one hour

– 90 Petabytes per year.

Enabling early detection of

faults, common mode failures, product engineering feedback.

Post Mortem ! Proactively Maintained Connected Product

Page 8: Is your cloud ready for Big Data? Strata NY 2013

8

The Emerging Pattern of Big Data Systems: Manufacturing

Exa-scale Data Store

Parallel Data Processing

Real-Time Processing Machine

Learning

Data Science

Cloud Infrastructure

Real-Time Sensor

Analytics Support Product

Engineering

Page 9: Is your cloud ready for Big Data? Strata NY 2013

© 2009 VMware Inc. All rights reserved

Cloud Platform

Page 10: Is your cloud ready for Big Data? Strata NY 2013

10

Cloud Platform: Supporting Mixed Big Data Workloads

Machine Learning Hadoop Real-Time

Analytics

Change workload types to Real-time Analytics, Machine Learning , Hadoop above cloud infra, too

Cloud Infrastructure

Machine Learning

Hadoop

Real-Time Analytics

Management

Network/Security

Storage/Availability

Compute

Page 11: Is your cloud ready for Big Data? Strata NY 2013

11

Cloud Platform: Supporting Multiple Tenants

Change workload types to Real-time Analytics, Machine Learning , Hadoop above cloud infra, too

Cloud Infrastructure

Management

Network/Security

Storage/Availability

Compute

Web User Analytics

Financial Analysis

Historical Customer Behavior

Page 12: Is your cloud ready for Big Data? Strata NY 2013

12

What if you can…

Experimentation

Production recommendation engine

Production Ad Targeting

Test/Dev

Production

Test

Production

Test

Experimentation

Recommendation engine Ad targeting

Experimentation

One physical platform to support multiple virtual big data clusters

Page 13: Is your cloud ready for Big Data? Strata NY 2013

13

Values of a Cloud Platform for Big Data

Agility / Rapid deployment

Lower Capex

Isolation for resource control and security

1

2

3

Operational efficiency 4

Management

Network/Security

Storage/Availability

Compute

Page 14: Is your cloud ready for Big Data? Strata NY 2013

14

Hadoop as a Service

!  Shrink and expand cluster on demand

!  Independent scaling of Compute and data

!  Strong multi-tenancy

Elasticity & Multi-tenancy

!  High availability for entire Hadoop stack

!  One click to setup

!  Battle-tested

High Availability

!  Rapid deployment !  One stop command

center

!  Easy to configure/reconfigure

Operational Simplicity

Page 15: Is your cloud ready for Big Data? Strata NY 2013

15

Self Service Access to Big Data Environments

Developer •  3 Hadoop nodes •  Cloudera, Pivotal

MapR •  Small VM •  Local storage •  No HA •  …

Data Scientist •  5 Hadoop nodes •  Cloudera, Pivotal •  Hive, Pig •  Medium VM •  HA •  …

High priority •  50 Hadoop nodes •  Cloudera •  Hive, Pig •  Large VM •  HA •  …

… •  … •  …

Templates for Different Cloud Users

Page 16: Is your cloud ready for Big Data? Strata NY 2013

16

Hadoop batch analysis

Big Data needs a Mix of Workloads

File System/Data Store

Host Host Host Host Host Host

HBase real-time queries

NoSQL Cassandra, Mongo, etc Big SQL

Impala, Pivotal HawQ

Compute layer

Platform Virtualization Technology

Host

Other Spark, Shark, Solr,

Platfora, Etc,…

Page 17: Is your cloud ready for Big Data? Strata NY 2013

17

Strong Isolation between Workloads is Key

Hungry Workload 1

Reckless Workload 2

Nosy Workload 3

Virtualization Platform

Page 18: Is your cloud ready for Big Data? Strata NY 2013

18

Community activity in Isolation and Resource Management

!  YARN • Goal: Support workloads other than M-R on Hadoop •  Initial need is for MPI/M-R from Yahoo

• Non-posix File system self selects workload types

! Mesos • Distributed Resource Broker

• Mixed Workloads with some RM

•  Active project, in use at Twitter •  Leverages OS Virtualization – e.g. cgroups

!  Virtualization •  Virtual machine as the primary isolation, resource management and

versioned deployment container

•  Basis for Project Serengeti

Page 19: Is your cloud ready for Big Data? Strata NY 2013

19

Use case: Elastic Hadoop with Tiered SLA

•  Production workloads has high priority •  Experimentation workloads has lower priority

Experimentation Dynamic resourcepool

Data layer

Production recommendation engine

Compute layer Compute VM

Compute VM

Compute VM

Compute VM

Compute VM

Compute VM

Compute VM

Compute VM

Compute VM

Compute VM

Compute VM

Compute VM

Compute VM

Compute VM

Compute VM

Experimentation Production

Compute VM

Experimentation Mapreduce

Production Mapreduce

vSphere

Page 20: Is your cloud ready for Big Data? Strata NY 2013

20

Cloud Enabled Auto-elastic Hadoop

ESX ESX ESX

JT

DATA VM DATA VM DATA VM

Local Disks

SAN/NAS Non-Hadoop VMs Hadoop Compute VMs

JT: JobTracker TT: TaskTracker NN: NameNode VHM: Virtual Hadoop Manager

NN

TT

TT

TTVHM

Hadoop HDFS VMs

TT

TT

TT

JT

Page 21: Is your cloud ready for Big Data? Strata NY 2013

21

Hadoop Performance with Virtualization

[http://www.vmware.com/resources/techresources/10360, Jeff Buell, Apr 2013]

(lower is better)

32 hosts/3.6GHz 8 cores/15K RPM 146GB SAS disks/10GbE/72-96GB RAM

Page 22: Is your cloud ready for Big Data? Strata NY 2013

© 2009 VMware Inc. All rights reserved

Network Platform

Page 23: Is your cloud ready for Big Data? Strata NY 2013

23

Host%

Host%

Host%

Top%of%Rack%Switch%

Host%

L2%Switch%

Top%of%Rack%Switch%

L2%Switch%

Host%

Host%

Host%

Host%

Top%of%Rack%Switch%

Host%

Host%

Host%

Host%

Top%of%Rack%Switch%

Host%

Host%

Host%

Host%

L2%Switch% L2%Switch%

Aggrega7ng%Switch%

Aggrega7ng%Switch%

A Typical Network Architecture

Page 24: Is your cloud ready for Big Data? Strata NY 2013

24

Traditional Networks: Core Switch is the Choke Point

Network Topology

Modeled Bandwidth Non Uniform Bandwidth

Core

Aggregation

Rack

Hosts Hosts

100s of Gbits 10s of Gbits

Page 25: Is your cloud ready for Big Data? Strata NY 2013

25

Modern Networks: Great for Big Data

Uniform Bandwidth

Network Topology

Modeled Bandwidth

Spine

Leaf

Hosts

Page 26: Is your cloud ready for Big Data? Strata NY 2013

26

Flat Networks Allow for New Infrastructure Models

Top%of%Rack%Switch%

Host%

Host%

Host%

Host%

Top%of%Rack%Switch%

Storage%

Storage%

Storage%

Storage%

Top%of%Rack%Switch%

Host%

Host%

Host%

Host%

Storage Converged

Storage Compute

Host%

Host%

Host%

Host%

Host%

Host%

Top%of%Rack%Switch%

Storage%

Separated Storage

Separated Storage

Page 27: Is your cloud ready for Big Data? Strata NY 2013

© 2009 VMware Inc. All rights reserved

Storage Platform

Page 28: Is your cloud ready for Big Data? Strata NY 2013

28

Use Local Disk where it’s Needed

SAN Storage

$2 - $10/Gigabyte

$1M gets: 0.5Petabytes

200,000 IOPS 8Gbyte/sec

NAS Filers

$1 - $5/Gigabyte

$1M gets: 1 Petabyte

200,000 IOPS 10Gbyte/sec

Local Storage

$0.05/Gigabyte

$1M gets: 10 Petabytes 400,000 IOPS

250 Gbytes/sec

Page 29: Is your cloud ready for Big Data? Strata NY 2013

29

Storage Economics: Traditional vs. Scale-out

$-

$0.50

$1.00

$1.50

$2.00

$2.50

$3.00

$3.50

$4.00

$4.50

$5.00

$5.50

0.5 1 2 4 8 16 32 64 128

Cost per GB

Petabytes Deployed

Traditional SAN/NAS

Distributed Object

Storage HDFS MAPR CEPH Gluster Scality Scale-out NAS

Isilon

Page 30: Is your cloud ready for Big Data? Strata NY 2013

30

Big Data Storage Architectures

External SAN With HDFS

Local Disks With HDFS or

Other

External Scale-out NAS

HDFS, CEPH, MAPR, Gluster, Scality,

Page 31: Is your cloud ready for Big Data? Strata NY 2013

31

Features from New Storage Solutions

Snapshots

Clones Erasure Coding

NFS Access

Universal File Store

Geo Replication

Posix Support SSD Capability QoS Controls

Page 32: Is your cloud ready for Big Data? Strata NY 2013

© 2009 VMware Inc. All rights reserved

Summary

Page 33: Is your cloud ready for Big Data? Strata NY 2013

33

Customers Winning from Consolidated Big Data Platforms

“Dedicated hardware makes no sense”

“Software-defined Datacenter enables rapid deployment multiple tenants and labs”

“Our mixed workloads include Hadoop, Database, ETL and

App-servers”

“Any performance penalties are minor” Management

Network/Security

Storage/Availability

Compute

Page 34: Is your cloud ready for Big Data? Strata NY 2013

34

Cloud Infrastructure is Ready for Big Data – Are you?

Cloud Infrastructure

Page 35: Is your cloud ready for Big Data? Strata NY 2013

© 2009 VMware Inc. All rights reserved

Is Your Cloud Ready for Big Data?

Richard McDougall

CTO, Storage and Application Services

@richardmcdougll