Top Banner
Flintrock: A faster, better spark-ec2 Nicholas Chammas, Spark Summit East 2016 1 / 26
26

Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas

Jan 08, 2017

Download

Data & Analytics

Spark Summit
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas

Flintrock: A faster, better spark-ec2

Nicholas Chammas, Spark Summit East 2016

1 / 26

Page 2: Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas

Motivation

Common developer problem:

Give me a working clusterDon't bother me too much with the detailsMake it quick

2 / 26

Page 3: Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas

spark-ec2

Single-purpose command-line toolLaunch and manage Spark clusters on EC2

3 / 26

Page 4: Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas

spark-ec2

Single-purpose command-line toolLaunch and manage Spark clusters on EC2

Common use cases:PrototypingSpark performance testing (spark-perf)

4 / 26

Page 5: Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas

spark-ec2

Problems:Slow launch times: ~9 minutes to launch 2-node cluster

5 / 26

Page 6: Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas

spark-ec2

Problems:Slow launch times: ~9 minutes to launch 2-node clusterPoor UX

e.g. Having to type this out over and over again...

./spark-ec2 launch my-cluster \ --identity-file ~/.ssh/identity.pem \ --key-pair my-key \ --instance-type m3.medium \ --region us-east-1

6 / 26

Page 7: Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas

spark-ec2

Problems:Slow launch times: ~9 minutes to launch 2-node clusterPoor UX

e.g. Having to type this out over and over again...

./spark-ec2 launch my-cluster \ --identity-file ~/.ssh/identity.pem \ --key-pair my-key \ --instance-type m3.medium \ --region us-east-1

Internals difficult to refactorMuch time has already been spent trying to make spark-ec2faster:

SPARK-4325SPARK-5189

7 / 26

Page 8: Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas

spark-ec2

Problems:Slow launch times: ~9 minutes to launch 2-node clusterPoor UX

e.g. Having to type this out over and over again...

./spark-ec2 launch my-cluster \ --identity-file ~/.ssh/identity.pem \ --key-pair my-key \ --instance-type m3.medium \ --region us-east-1

Internals difficult to refactorMuch time has already been spent trying to make spark-ec2faster:

SPARK-4325SPARK-5189

spark-ec2 was created as a convenience side-toolNot originally intended to stand as its own project

8 / 26

Page 9: Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas

Why a new tool?

Plenty of great options out there:DatabricksSpark on EMRApache Bigtop, Ubuntu Juju, Terraform, Ansible, etc.

9 / 26

Page 10: Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas

Why a new tool?

Plenty of great options out there:DatabricksSpark on EMRApache Bigtop, Ubuntu Juju, Terraform, Ansible, etc.

It's fun (for me to build)

10 / 26

Page 11: Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas

Why a new tool?

Plenty of great options out there:DatabricksSpark on EMRApache Bigtop, Ubuntu Juju, Terraform, Ansible, etc.

It's fun (for me to build)Perhaps you don't want a framework

You want a single-purpose tool

11 / 26

Page 12: Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas

Why a new tool?

Plenty of great options out there:DatabricksSpark on EMRApache Bigtop, Ubuntu Juju, Terraform, Ansible, etc.

It's fun (for me to build)Perhaps you don't want a framework

You want a single-purpose toolPerhaps you don't want to be tied to something proprietary

12 / 26

Page 14: Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas

Flintrock

Features

Obsessive focus on speede.g. Launching a cluster with 100 slaves

spark-ec2 takes more than 1 hourFlintrock can do it in under 5 minutes

14 / 26

Page 15: Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas

Flintrock

Features

Obsessive focus on speede.g. Launching a cluster with 100 slaves

spark-ec2 takes more than 1 hourFlintrock can do it in under 5 minutes

Empathy for userPersist your configuration to a file. Then, all you need to launch acluster is:

flintrock launch test-cluster

15 / 26

Page 16: Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas

Flintrock

Features

Obsessive focus on speede.g. Launching a cluster with 100 slaves

spark-ec2 takes more than 1 hourFlintrock can do it in under 5 minutes

Empathy for userPersist your configuration to a file. Then, all you need to launch acluster is:

flintrock launch test-cluster

AccessibilityInstall via pip - Python 3.5+ required

pip install flintrock

Standalone packages - Python not required!https://github.com/nchammas/flintrock/releases

16 / 26

Page 17: Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas

Flintrock

Commands: launch, login, describe, stop, start, destroy, run-command, copy-fileExamples:run-command

flintrock run-command cluster 'sudo yum install -y expect'

copy-file

flintrock copy-file cluster small-file.json /tmp/

17 / 26

Page 18: Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas

Flintrock

Warning: Flintrock is not a drop-in replacement for spark-ec2Scope is more limitedCommands and options are clearly different

18 / 26

Page 19: Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas

Flintrock

Warning: Flintrock is not a drop-in replacement for spark-ec2Scope is more limitedCommands and options are clearly different

Beta-phase of development -- currently at version 0.3

19 / 26

Page 20: Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas

Flintrock

Warning: Flintrock is not a drop-in replacement for spark-ec2Scope is more limitedCommands and options are clearly different

Beta-phase of development -- currently at version 0.3Already used to launch 200+ node clusters

20 / 26

Page 21: Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas

Flintrock

Warning: Flintrock is not a drop-in replacement for spark-ec2Scope is more limitedCommands and options are clearly different

Beta-phase of development -- currently at version 0.3Already used to launch 200+ node clustersSupport for OS X and Linux

Windows support possible in the future

21 / 26

Page 22: Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas

Flintrock

Warning: Flintrock is not a drop-in replacement for spark-ec2Scope is more limitedCommands and options are clearly different

Beta-phase of development -- currently at version 0.3Already used to launch 200+ node clustersSupport for OS X and Linux

Windows support possible in the futureArchitecture supports multiple providers

Perhaps support for Google Compute Engine will be added later thisyear

22 / 26

Page 23: Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas

Flintrock

Warning: Flintrock is not a drop-in replacement for spark-ec2Scope is more limitedCommands and options are clearly different

Beta-phase of development -- currently at version 0.3Already used to launch 200+ node clustersSupport for OS X and Linux

Windows support possible in the futureArchitecture supports multiple providers

Perhaps support for Google Compute Engine will be added later thisyear

100% open source; Apache 2.0 licensedNot company-backed

23 / 26

Page 24: Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas

Flintrock

Warning: Flintrock is not a drop-in replacement for spark-ec2Scope is more limitedCommands and options are clearly different

Beta-phase of development -- currently at version 0.3Already used to launch 200+ node clustersSupport for OS X and Linux

Windows support possible in the futureArchitecture supports multiple providers

Perhaps support for Google Compute Engine will be added later thisyear

100% open source; Apache 2.0 licensedNot company-backed

Contribute!We have unit and acceptance testsDevelopment done entirely on GitHub

24 / 26

Page 25: Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas

Demo

25 / 26

Page 26: Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas

Nicholas Chammas

https://github.com/nchammas/flintrock

Slideshow created using remark / gistdeck.

26 / 26