Top Banner
Dockerized Hadoop Platform and Recent Updates in Apache Bigtop Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver
85

Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

May 15, 2018

Download

Documents

buidat
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Dockerized Hadoop Platform and Recent Updates in

Apache BigtopYu-hsin Yeh (Evans Ye)

Apache Big Data NA 2016Vancouver

Page 2: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Outline• Bigtop Provisioner

• Integrating Dcoker Compose

• Integrating Dcoker Machine & Swarm

• Image Pre-build

• Project updates and PPC porting

Page 3: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Who am I• Apache Bigtop PMC member

• Software Engineer @ Trend Micro

• Develop big data apps & infra

Page 4: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Bigtop Provisioner

Page 5: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

• A tool to demonstrate the full life cycle of Bigtop

Bigtop Provisioner

Packaging TestingDeploymentVirtualization

Create resources Run Bigtop Puppet Run Bigtop Tests

Bigtop Provisioner

Page 6: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

• Fast iterative development

• Test your code in the cluster, on your laptop, w/o human intervention

• Flexibility

• Choose any combination of components you want

• Responsive CI

• Integration tests that get you the result in mins

• A Big Data Stack playground

• Spark + Tachyon, Spark + Ignite, Apex, etc

Goal

Page 7: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Vagrant+

Automation Code+

Bigtop Puppet

One-click Hadoop Provisioning

[Bigtop Provisioner

=

Page 8: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

• We use Vagrant as an abstraction layer to support different kind of resource providers

Vagrant

Providers

Page 9: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

One click Hadoop provisioning

./docker-hadoop.sh -c 3

Page 10: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

bigtop/deploy image on Docker hub

./docker-hadoop.sh -c 3

One click Hadoop provisioning

Page 11: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

bigtop/deploy image on Docker hub

./docker-hadoop.sh -c 3

puppet apply

puppet apply

puppet apply

One click Hadoop provisioning

Page 12: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Bigtop/deploy Images

Dockerhub official images

install Puppet

bigtop/deploy

install Vagrant ssh key

Page 13: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

• Supported providers in Bigtop 1.1.0 release

• Virtaulbox VM

• Docker container

• OpenStack

Bigtop Provisioner

Page 14: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

• For Hadoop app developers, cluster admins, users

• Run a Hadoop cluster to test your code on

• Try & test configurations before applying to Production

• Play around with Bigtop Big Data Stack

• For contributors

• Easy to test your packaging, deployment, testing code

• For vendors

• CI out of the box —> patch upstream code made easier

Use Cases

Page 15: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Integrating Docker Compose

Page 16: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

• Need to add vagrant public key into docker images

• Too many issues with auto-created boot2docker hosting VM

• A bug for docker provider keep opening for almost 2y

• Waiting for machine to boot' hangs infinitely

• Can not share same code for different providers anyway

• Not all the docker options supported in Vagrantfile

• Does not support Docker Swarm

• Slow

What’s the problem with Vagrant’s Docker Provider?

Page 17: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Docker Compose

Page 18: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Docker Compose kafka: build: . ports: - “9092:9092” spark: image: spark port: - “8080:8080” ……

Web Portal

API Server

up / stop / kill / rm

Page 19: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

• Create docker containers:

• docker-compose scale bigtop=3

• Volmes:

• Bigtop Puppet configurations

• Bigtop Puppet code

• /etc/hosts

• privileged: true

Integration Details

Page 20: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

./docker-hadoop.sh

--create 3

Integrating with Docker Compose

Page 21: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Supported OS image on Docker hub

$ docker-compose scale bigtop_local=3

Page 22: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Supported OS image on Docker hub

$ docker inspect --format "{{.NetworkSettings.IPAddress}} {{.Config.Hostname}}.

{{.Config.Domainname}}

/etc/hosts

/etc/hosts

/etc/hosts

Page 23: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Supported OS image on Docker hub

$ docker exec $node bash -c “./puppetize.sh”

Page 24: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Supported OS image on Docker hub

$ puppet apply bigtop-deploy/puppet/manifests/site.pp

Page 25: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Supported OS image on Docker hub

Finished

Page 26: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

• No need to create customised image beforehand

• Better compatibility with Docker’s native solutions

• Clear, simple yaml file for orchestration settings

• Has the scale option to easily scale up/down

• Native support to Docker Swarm

• Supports new features such as overlay network and named volume

• Fast —> better user experience

Advantages

Page 27: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Integrating Docker Machine & Swarm

Page 28: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Docker Machine

Page 29: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

• On local machine

• virtualbox, vmwarefusion

• On the cloud

• amazonec2, azure, digitalocean, exoscale, google, rackspace, softlayer

• Inside your own datacenter

• generic, openstack, vmwarevcloudair, vmwarevsphere

Let you create Docker hosts

Page 30: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

• The following are auto-configured:

• Docker engine install/upgrade

• TLS encryption & authentication

• Certificate generation & key signing

• Swarm

Key Features

Page 31: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Docker Swarm

Page 32: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Docker Swarm

Page 33: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

• Filters

• Memory, CPU, Network

• docker run -m 1g -c 1 -p 80:80

• Scheduling strategies

• spread, binpack, random

Resource Management

Page 34: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

High Availability

Forward

PrimaryReplicaReplica

Page 35: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

• Swarm with overlay network has basic service discovery:

• IP to container name mapping are auto-generated, maintained in each container’s /etc/hosts

• However, they’re invalid hostname with underscores:

• On the system, the hostname is still a hash

• Hostname can be configured in config file, but it’s not working with scale —> each scaled container gets same hostname

Integration Issues

Page 36: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

• Use Docker Machine to create a Swarm cluster

• Instead of sharing volumes, use docker cp to copy files:

• Bigtop Puppet configurations

• Bigtop Puppet code

• /etc/hosts

• privileged: true

• Use overlay network

Integration details

Page 37: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

• docker-machine create -d virtualbox kvstore

• eval $(docker-machine env kvstore)

• docker run -d -p 8500:8500 --name=consul progrium/consul -server -bootstrap

Swarm on Virtualbox

Page 38: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

• docker-machine create -d virtualbox --swarm --swarm-master --swarm-discovery="consul://$(docker-machine ip kvstore):8500" --engine-opt="cluster-store=consul://$(docker-machine ip kvstore):8500" --engine-opt="cluster-advertise=eth1:2376" swarm-master

Swarm on Virtualbox

Page 39: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

• docker-machine create -d virtualbox --swarm --swarm-discovery="consul://$(docker-machine ip kvstore):8500" --engine-opt="cluster-store=consul://$(docker-machine ip kvstore):8500" --engine-opt="cluster-advertise=eth1:2376" swarm-slave

Swarm on Virtualbox

Page 40: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

./docker-hadoop.sh

--swarm --create 3

Integrating with Docker Machine & Swarm

Page 41: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

$ ./docker-hadoop.sh --swarm

Key-Value Store

Swarm Master

Swarm Slave

Page 42: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

$ docker-compose scale bigtop_swarm=3

Key-Value Store

Swarm Master

Swarm Slave

Supported OS image on Docker hub

Page 43: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

$ docker inspect --format "{{.NetworkSettings.Networks.$OVERLAY.IPAddress}}

{{.Config.Hostname}}.{{.Config.Domainname}}Key-Value Store

Swarm Master

Swarm Slave

/etc/hosts

/etc/hosts /etc/hosts

Page 44: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

$ docker exec $node bash -c “./puppetize.sh”

Key-Value Store

Swarm Master

Swarm Slave

Page 45: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

$ puppet apply bigtop-deploy/puppet/manifests/site.pp

Key-Value Store

Swarm Master

Swarm Slave

Page 46: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

$ puppet apply bigtop-deploy/puppet/manifests/site.pp

Key-Value Store

Swarm Master

Swarm Slave

Page 47: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

• Run Hadoop cluster on Docker anywhere

• Provision a fully distributed, multi-host Docker based Hadoop cluster

Advantages

Page 48: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Not recommended for Production

Page 49: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Image Pre-build

Page 50: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

You’re damned right.5 mins provisioning time is too long!

An Apache guy Bigtoper

Page 51: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Idea• For users, they don’t want to build/test packages,

they just want a cluster

• Docker images are immutable

• RPM/DEB packages are also immutable

• Let’s build an image that preload all the packages

Page 52: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Image Pre-build

bigtop/deploy:prebuild

Page 53: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Seriously?

Page 54: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Second Thought• Each company has it’s own Big Data Stack

• TM Hadoop = Hadoop + HBase + Pig + Oozie

Page 55: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Second Thought• Within same company, there might be multiple

stacks serving different purpose

• Product Specific Platform =Spark + Docker + Akka + Cassandra + Kafka

Page 56: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Image Pre-build

Dockerhub official images

bigtop/deploy:prebuild

components: [hadoop, yarn]

Yarn packages

Hadoop packages

Page 57: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Provision a HDFS cluster took

2m43s —> 0m50s

Page 58: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

226% Faster

Page 59: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Updates

Page 60: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Better CI

Page 61: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

New Components• New in Bigtop 1.1 release Feb, 2016

• Apache Hama 0.7.0

• Apache Tajo 0.11.1

• Apache Zeppelin 0.5.6

• New in Bigtop master branch

• Apache Apex 3.3.0

• QFS 1.1.4

• Apache Flink - BIGTOP-1927, PR available

Page 62: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

© 2016 OpenPOWER Foundation

Amir Sanjar OpenPower Foundation Member Senior Software Engineer, IBM Power Systems Software & Solutions

Page 63: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

© 2016 OpenPOWER Foundation

What is OpenPower Foundation?OpenPOWER is an open development community, using the POWER Architecture to serve the evolving needs of customer

OpenPOWER Open Innovation

Page 64: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

© 2016 OpenPOWER Foundation64

continents

60+technologies revealed

innovations under way 100s

members 200+

24 6 countries

OpenPOWER Foundation reach

Page 65: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

© 2016 OpenPOWER Foundation

Why OpenPower Foundation?

Page 66: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

© 2016 OpenPOWER Foundation66

PERFORMANCE WALL

Page 67: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

© 2016 OpenPOWER Foundation

l Moore’s law no longer satisfies performance gain

l Growing workload demands more performance

Page 68: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

© 2016 OpenPOWER Foundation

Solution?

Page 69: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

© 2016 OpenPOWER Foundation

Accelerated Technology

Page 70: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

© 2016 OpenPOWER Foundation70

Accelerated Technology roadmap

2015 2016 2017

POWER8 POWER8 with NVLink POWER9OpenPower CAPI Interface

Enhanced CAPI & NVLink

Connect-IB FDR Infiniband

PCIe Gen3

ConnectX-4 EDR Infiniband CAPI over PCIe

Gen3

ConnectX-5 Next-Gen Infiniband Enhanced CAPI over

PCIe Gen4

Mellanox Intercon

nect

IBM CPUs

NVIDIA GPUs

Kepler PCIe Gen3

Volta Enhanced

NVLink

Pascal NVLink

IBM Systems

Page 71: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

© 2016 OpenPOWER Foundation71

Power8

Graphics Memory

System Memory

Graphics Memory

NVLinkPCIeGPU CPU

Graphics Memory

System Memory

16 GB/s 40 GB/s

40 GB/

s

40 G

B/s

OpenPOWER Technology: 2.5x Faster CPU-GPU Connection via NVLink

GPUs Bottlenecked by PCIe Bandwidth From CPU-System Memory

NVLink Enables Fast Unified Memory Access between CPU &

GPU Memories

NVLinkN

VLin

k

System bottlene

ck

GPU

GPU

Page 72: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

© 2016 OpenPOWER Foundation

System Performance of Apache Bigtop Spark 1.5.1 on POWER

72

Spark Workloads

Rel

ativ

e S

yste

m P

erfo

rman

ce

0

0.75

1.5

2.25

3

E5-2

620 v

3

100G

B Mat

. Fac

t.

100G

B (in

mem

) LR

1TB

(in m

em) L

R

1TB

(50/

50) L

R

1TB

SVM

10TB

LR

1TB

5 que

ry

2TB

5 que

ry

130G

B Pa

ge R

ank

1TB T

riang

le Cn

t

1TB

SVD+

+

AVER

AGE

Machine Learning SQL Graph

1.7X

Page 73: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

© 2016 OpenPOWER Foundation73

Spark Workloads

Rel

ativ

e P

rice

Per

form

ance

0

0.75

1.5

2.25

3

E5-262

0 v3

100G

B Mat.

Fact.

100G

B (in m

em) L

R

1TB (in

mem

) LR

1TB (5

0/50)

LR

1TB S

VM

10TB LR

1TB 5

query

2TB 5

query

130G

B Pag

e Ran

k

1TB Tr

iangle

Cnt

1TB S

VD++

AVERAGE

Price Performance of Apache Bigtop Spark on POWER

Machine Learning SQL Graph

1.5X•Spend 33% less on infrastructure supporting the same amount of workload

•Spend the same on infrastructure but host 50% more workload

* - based on preliminary SoftLayer pricing targets – subject to change

Page 74: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

© 2016 OpenPOWER Foundation

Accelerated Spark Demo

Adverse Drug Reaction Prediction

Bigtop Spark 1.5.1 NVidia GPU

Page 75: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

© 2016 OpenPOWER Foundation

§ Apache BigTop distribution for Power

• Port Bigtop stack to Power • Build Bigtop stack for Power

Page 76: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Port to Power8§Porting to Power has become effortless.

•Advent of OpenJDK for POWER •No porting required •%100 compatible with Oracle JAVA.

•Power8 with Little endian •No porting, just recompile native (C/C++) libraries

•Ported 22 out of 24 Apache Big Top stacks to POWER in two week.

Page 77: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Build Bigtop Distribution for Power

§Build of Apache big dataprojects are not for the faint-hearted. §Many build dependencies §Various development tools §Many Linux distribution to support

Page 78: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Preparing for Build Hadoop

Page 79: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

What Apache BigTop offers?§Apache Bigtop has Dockerized the entire build environment.

§The entire build environment in a Docker images §Available on Docker Hub

§Porting Bigtop Docker images to Power was achieved without great effort.

§Made possible with only five patches §Three Dockerfiles and three scripts

§/bigtop/docker/bigtop-puppet/ubuntu-15.04-ppc64le/Dockerfile §/bigtop/docker/bigtop-salves/ubuntu-15.04-ppc64le/Dockerfile §/bigtop/docker/bigtop-deploy/ubuntu-15.04-ppc64le/Dockerfile §/bigtop/bigtop_toolchain/manifests/env.pp §/bigtop/bigtop_toolchain/manifests/protobuf.pp §/bigtop/bigtop_toolchain/bin/puppetize.sh

Page 80: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

….. …… ……. FROM ppc64le/ubuntu:15.04 MAINTAINER Amir Sanjar

COPY puppetize.sh /tmp/puppetize.sh

RUN bash /tmp/puppetize.sh

Example

Page 81: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

One click build$ git clone https://github.com/apache/bigtop.git $ docker run -v `pwd`:/ws bigtop/slaves:trunk-ubuntu-15.04-ppc64le \ bash -l -c 'cd /ws ; ./gradlew hadoop-deb'

Page 82: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

One click install

$ sudo apt-get install spark-master

Page 83: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

bigtop-groovy 2.4.4 bigtop-jsvc 1.0.15 bigtop-tomcat 6.0.36 bigtop-utils 1.1.0 crunch 0.12.0 datafu 1.0.0 flume 1.6.0 giraph 1.1.0 hadoop 2.7.1 hama 0.7.0 hbase 0.98.12

Apache BigTop Release 1.1.0 For OpenPower kafka 0.8.1.1 kite 1.1.0 mahout 0.11.0 oozie 4.2.0 phoenix 4.6.0 pig 0.15.0 solr 4.9.0 spark 1.5.1 sqoop 1.4.5 sqoop2 1.99.4 tachyon 0.6.0

Page 84: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

Apache BigTop CICentOS

Fedora

Ubuntu

Debian

OpenSuSE

Page 85: Dockerized Hadoop Platform and Recent Updates in …schd.ws/hosted_files/apachebigdata2016/4e/Dockerized Hadoop... · Dockerized Hadoop Platform and Recent Updates in ... Packaging

© 2016 OpenPOWER Foundation85

Thank you.