Hadoop on Virtual Machines

Hadoop in Virtual Machines

Richard McDougall, VMwareSanjay Radia, Hortonworks

Hadoop Summit, 2012

Part 1

Say What?

• VMs will just add overhead, due to I/O virt• VMs run on SAN, we’re all about local disks• Hadoop does it’s own cluster management• It’ll do resource management in 2.0• And even HA is coming to Hadoop

• And… what is the point, anyway?

But you’ve been asking…

• Can I virtualize my Hadoop, so that I can make it easier, quicker to get a cluster up and running

• Is it possible to run Hadoop on those spare machine cycles I have on hundreds/thousands of nodes?

• Can I make my system more available by using some of the standard HA features?

And the savvy are asking…

• Can I avoid having to install special hardware for the master services, like name-node, job-tracker?

• Can I dynamically change the size of the cluster to use more resources?

• Can I use VM isolation to increase security or guard against resource-intensive neighbors?

• Is it feasible to provision virtual-clusters, giving out one each to a business unit?

Ok, so first what about the concerns?

SAN Storage

$2 - $10/Gigabyte

$1M gets:0.5Petabytes

1,000,000 IOPS1Gbyte/sec

NAS Filers

$1 - $5/Gigabyte

$1M gets:1 Petabyte

400,000 IOPS2Gbyte/sec

• Use your SAN? … if you want to.

Local Storage

$0.05/Gigabyte

$1M gets:20 Petabytes

10,000,000 IOPS800 Gbytes/sec

Hadoop Using Local Disks

Virtualization Host

HadoopVirtualMachine

OtherWorkload

VMDK

Datanode

VMDK VMDK

Ext4 Ext4 Ext4

Task Tracker

Shared

Storage

OS Image - VMDK

Hadoop Perf in a VM (Ratio is elapsed time to physical, Lower Is Better)

Pi

TestDFS

IO-w

rite

TestDFS

IO-re

ad

TeraGen 1 TB

TeraSort

1 TB

TeraVali

date 1 TB

TeraGen 3.5 TB

TeraSort

3.5 TB

TeraVali

date 3.5 TB

0

0.2

0.4

0.6

0.8

1

1.2

1 VM2 VMs

Ratio

to N

ative

Storage

Evolution of Hadoop on VMs

Compute

Current Hadoop:

Combined Storage/Compute

Storage

T1 T2

VM VM VM

VMVM

VM

Hadoop in VM- VM lifecycle

determinedby Datanode

- NOT Elastic- Limited to Hadoop

Multi-Tenancy

Separate Storage- Separate compute

from data- Elastic compute- Enable shared

workloads- Raise utilization

Separate Compute Clusters- Separate virtual clusters

per tenant- Stronger VM-grade security

and resource isolation- Enable deployment of

multiple Hadoop runtime versions

Virtualization Host

1. Hadoop Task Tracker and Data Node in a VM

VirtualHadoopNode

OtherWorkload

VMDK

Datanode

Task Tracker

Slot

SlotAdd/RemoveSlots?

Grow/Shrinkby tens of GB?

Grow/Shrink of a VM is one approach

2. Add/remove Virtual Nodes

Virtualization Host

VirtualHadoopNode

OtherWorkload

VMDK

Datanode

Task Tracker

Slot

Slot

VirtualHadoopNode

VMDK

Datanode

Task Tracker

Slot

Slot

Just add/remove more virtual nodes?

But State makes it hard to power-off a node

Virtualization Host

VirtualHadoopNode

OtherWorkload

VMDK

Datanode

Task Tracker

Slot

Slot

Powering off the Hadoop VMwould in effect fail the datanode

Adding a node needs data…

Virtualization Host

VirtualHadoopNode

OtherWorkload

VMDK

Datanode

Task Tracker

Slot

Slot

Adding a node would require TBs of data replication

VirtualHadoopNode

VMDK

Datanode

Task Tracker

Slot

Slot

VirtualHadoopNode

Datanode

2. Separated Compute and Data

Virtualization Host

VirtualHadoopNode

OtherWorkload

VMDK

Task Tracker

Slot

SlotVirtualHadoopNode

VMDK

Task Tracker

Slot

SlotVirtualHadoopNode

VirtualHadoopNode

Task Tracker

Slot

Slot

Truly Elastic Hadoop:Scalable through virtual nodes

Dataflow with separated Compute/Data

Virtualization Host

VirtualHadoopNode

VMDK

Datanode

VirtualHadoopNode

NodeManager

Slot

Slot

Virtual Switch

Virtual NIC Virtual NIC

NIC Drivers

Performance Analysis of Split

1 Datanode VM, 1 Compute nodes VM per Host

Datanode Datanode

NodeManager

NodeManager

NodeManager

NodeManager

Datanode Datanode

1 Combined Compute/Datanode VM per Host

Workload: Teragen, Terasort, TeravalidateHW Configuration: 8 cores, 96GB RAM, 16 disks per host x 2 nodes

Performance Analysis of Split(Elapsed time: ratio to combined)

Teragen Terasort Teravalidate0

0.2

0.4

0.6

0.8

1

1.2

CombinedSplit

Vir

tual

Had

oop

Qu

eue

Tying it together: Elastic Hadoop

Host Host Host Host Host Host

Distributed File System (HDFS, KFS, GPFS, MAPR, Isilon,…)

Namespace Namespace Namespace

Vir

tual

Had

oop

Vir

tual

Had

oop

Vir

tual

Had

oop

Publi

c

Publi

c

Secre

tData Layer

Runtime Layer

Coke Pepsi

Demo: Shrink/Expand Cluster


Datanode

Web Server

Web Server

Datanode

Web Server

Web Server

Datanode

NodeManager

NodeManager

Datanode

NodeManager

NodeManager

Setup 1 Datanodes, 2 Nodemanagers and 2 web servers on each physical host

Web Server

Web Server

Web Server

Web Server

NodeManager

NodeManager

NodeManager

NodeManager


Datanode

Web Server

Web Server

Datanode

Web Server

Web Server

Datanode

NodeManagerNodeManager

Web Server

Web Server

Datanode

NodeManagerNodeManager

When web load is high in daytime, we can suspend some Nodemanagers and power on more Web servers.

Web Server

Web Server

NodeManager

NodeManager

NodeManager

NodeManager

Demo

Part 2

Expand Hadoop Ecosystem

• Hortonworks goal– Expand Hadoop ecosystem– Provide first class support of various platforms

• Hadoop should run well on VMs• VMs offer several advantages as presented earlier

• Take advantage of vSphere for HA

Page 25

VMware-Hortonworks Joint Engineering

• First class support for VMs– Topology plugins (Hadoop-8468)

• 2 VMs can be on same host– Pick closer data– Schedule tasks closer– Don’t put two replicas on same host

– MR-tmp on HDFS using block pools• Elastic Compute-VMs will not need local disk

– Fast communications within VMs

Page 26

27

Hadoop Total System Availability Architecture

HA Cluster for Master Daemons

Server Server Server

NN JT

Failover

N+K failover

Apps Running Outside

Apps pau

se/re

try

Pause/retry JT into Safemode

NN

job job job job job

Slave Nodes of Hadoop Cluster

© Hortonworks Inc. 2011

HA is coming in 1.0 Using Total System Availability Architecture

28

29

HA in Hadoop 1 with HDP1

• Total System Availability Architecture– Namenode

• Clients pause automatically• JobTracker pauses automatically

– Other Hadoop master services (JT, …) coming

• Use industry proven HA framework– VMWare vSphere-HA

• Failover, fencing, …• Corner cases are tricky – if not addressed, corruption

– Addition benefits: • N-N & N+K failover• Migration for maintenance

Hadoop NN/JT HA with vSphere

Page 30

NameNode HA – Failover Times

• NameNode Failover times with vSphere and LinuxHA– Failure detection + Failover – 0.5 to 2 minutes

– OS bootup needed for vSphere – 10-20 seconds

– Namenode Startup (exit safemode)

• Small/Medium clusters – 1 to 2 minutes

• Large cluster – 5 to 15 minutes

• NameNode startup time measurements– 60 Nodes, 60K files, 6 million blocks, 300 TB raw storage – 40 sec

– 180 Nodes, 200K files, 18 million blocks, 900TB raw storage – 120 sec

Cold Failover is good enough for small/medium clusters Failure Detection and Automatic Failover Dominates

31

Demo

Summary

• Advantages of Hadoop on VMs– Cluster Management– Cluster consolidation– Greater Elasticity in mixed environment– Alternate multi-tenancy to capacity scheduler’s

offerings• HA for Hadoop Master Daemons

– vSphere based HA for NN, JT, … in Hadoop 1– Total System Availability Architecture

Page 33

Hadoop on Virtual Machines

Technology

Hadoop on Virtual Machines