Top Banner
Spark on Kubernetes Advanced Spark and TensorFlow Meetup (19 Jan 2017) Anirudh Ramanathan (Google) GitHub: foxish
31

Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - Anirudh Ramanthan from Google Kubernetes Team

Feb 07, 2017

Download

Software

Chris Fregly
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - Anirudh Ramanthan from Google Kubernetes Team

Spark on KubernetesAdvanced Spark and TensorFlow Meetup (19 Jan 2017)

Anirudh Ramanathan (Google)GitHub: foxish

Page 2: Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - Anirudh Ramanthan from Google Kubernetes Team

What is Kubernetes

● Open source cluster manager originally developed by Google

● Based on a decade and a half of experience in running containers at scale

● Has over 1000 contributors and 30,000+ commits on Github

● Container centric infrastructure

● Deploy and manage applications declaratively

Page 3: Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - Anirudh Ramanthan from Google Kubernetes Team

High level overview

users

master

nodes

CLI

API

UI

apiserver

kubelet

kubelet

kubelet

scheduler

Page 4: Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - Anirudh Ramanthan from Google Kubernetes Team

Concepts

0. Container: A sealed application package (Docker)

1. Pod: A small group of tightly coupled Containersexample: content syncer & web server

2. Controller: A loop that drives current state towards desired stateexample: replication controller

3. Service: A set of running pods that work togetherexample: load-balanced backends

Page 5: Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - Anirudh Ramanthan from Google Kubernetes Team

Concept: Pod

Pod

Volume

Containers

Pod

Containers

8080 8080 8080

Volume

Node● Pods are the atom of

scheduling and scaling

● Pods may contain one or more containers and attached volumes

● Each pod has its own IP address

Page 6: Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - Anirudh Ramanthan from Google Kubernetes Team

Why Spark?

● Spark is used for processing many kinds of workloads○ Batch

○ Interactive

○ Streaming

● Lots of organizations already run their serving workloads in Kubernetes

● Better resource sharing and management when all workloads run on a

single cluster manager

Page 7: Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - Anirudh Ramanthan from Google Kubernetes Team

Spark Standalone on Kubernetes

Setup one master controller and a worker pods in a standalone cluster on top of Kubernetes: https://github.com/kubernetes/kubernetes/tree/master/examples/spark

● Resource negotiation tied to Spark standalone and Kubernetes configuration

● No easy way to dynamically scale number of workers when there are idle resources

● Lacks robust authentication and authorization mechanism

● FIFO scheduling only

Page 8: Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - Anirudh Ramanthan from Google Kubernetes Team

Spark External Cluster Backends

● Standalone Mode

● YARN client/cluster mode

● Mesos client/cluster mode

Page 9: Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - Anirudh Ramanthan from Google Kubernetes Team

Spark External Cluster Backends

● Standalone Mode

● YARN client/cluster mode

● Mesos client/cluster mode

● Kubernetes client/cluster mode

Page 10: Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - Anirudh Ramanthan from Google Kubernetes Team

Kubernetes as a Cluster Scheduler Backend

● Cluster mode support

● The driver shall run within the

cluster

● Coarse grained mode

● Spark talks to kubernetes

clusters directly

spark-submit

--master=k8s://<IP>

Kubernetes

driv

er &

ex

ecut

ors

Page 11: Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - Anirudh Ramanthan from Google Kubernetes Team

Spark Cluster Mode

http://spark.apache.org/docs/latest/cluster-overview.html

● Each application gets its own

executor processes

● Tasks from different

applications run in different

JVMs

● Executors talk back to the Driver

and run tasks in multiple

threads

Page 12: Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - Anirudh Ramanthan from Google Kubernetes Team

Roadmap

● Phase 1 design complete; implementation in progress

● Phase 2 & 3 design in progress

● https://github.com/apache-spark-on-k8s/spark

● https://issues.apache.org/jira/browse/SPARK-18278

Page 13: Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - Anirudh Ramanthan from Google Kubernetes Team

Communication

● Kubernetes provides a REST API

● Fabric8's Kubernetes Java client

to make REST calls

● Allows us to create, watch,

delete Pods and higher level

controllers from Scala/Java

code

REST API c

alls

apiserver

scheduler

Page 15: Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - Anirudh Ramanthan from Google Kubernetes Team

Dynamic Executor Scaling

Hypothesis 1

● The set of executors can be adequately represented by a ReplicaSet

ReplicaSet

create

run 3 executor pods

Page 16: Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - Anirudh Ramanthan from Google Kubernetes Team

Dynamic Executor Scaling

Hypothesis 1

● The set of executors can be adequately represented by a ReplicaSet

● Which one do we kill?● Spark knows to intelligently

scale down but the ReplicaSet does not

ReplicaSet

kill one?

scale down to 2

Page 17: Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - Anirudh Ramanthan from Google Kubernetes Team

Solution: Driver pod as controller

● Let the Spark driver pod launch

executor pods

● Scale up/down can be such that

we lose the least amount of

cached data spark

-subm

it

kubernetes cluster

apiserver

scheduler

Page 18: Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - Anirudh Ramanthan from Google Kubernetes Team

Solution: Driver pod as controller

● Let the Spark driver pod launch

executor pods

● Scale up/down can be such that

we lose the least amount of

cached data

kubernetes cluster

apiserver

schedulerspark driver pod

schedule driver pod

Page 19: Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - Anirudh Ramanthan from Google Kubernetes Team

Solution: Driver pod as controller

● Let the Spark driver pod launch

executor pods

● Scale up/down can be such that

we lose the least amount of

cached data

kubernetes cluster

apiserver

schedulerspark driver pod

create executor pods

Page 20: Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - Anirudh Ramanthan from Google Kubernetes Team

Solution: Driver pod as controller

● Let the Spark driver pod launch

executor pods

● Scale up/down can be such that

we lose the least amount of

cached data

spark driver pod

kubernetes cluster

apiserver

schedulerschedule

executor pods

Page 21: Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - Anirudh Ramanthan from Google Kubernetes Team

Solution: Driver pod as controller

● Let the Spark driver pod launch

executor pods

● Scale up/down can be such that

we lose the least amount of

cached data

spark driver pod

kubernetes cluster

apiserver

scheduler

Page 22: Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - Anirudh Ramanthan from Google Kubernetes Team

Solution: Driver pod as controller

● Let the Spark driver pod launch

executor pods

● Scale up/down can be such that

we lose the least amount of

cached data

Spark job completed

kubernetes cluster

apiserver

schedulerget

output/l

ogs

Page 23: Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - Anirudh Ramanthan from Google Kubernetes Team

Demo

Page 24: Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - Anirudh Ramanthan from Google Kubernetes Team

Shuffle Service

● The shuffle service is a component that persists files written by

executors beyond the lifetime of the executors

● Important (and required) for dynamic allocation of executors

● Typically one per node or instance and shared by different executors

● Can kill executors without fear of losing data and triggering

recomputation

● Considering two possible designs of the Shuffle Service

Page 25: Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - Anirudh Ramanthan from Google Kubernetes Team

Shuffle Service: DaemonSet

● One shuffle service per node

● Idiomatic and similar to other cluster

schedulers

● Requires disk sharing between a

DaemonSet pod and each executor

pod

● Difficult to enforce ACLs

foo-1

bar -1

shuffle service

foo-2

bar -2

shuffle service

driver foo driver bar

Page 26: Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - Anirudh Ramanthan from Google Kubernetes Team

Shuffle Service: Per Executor

● Strong isolation possible between

shuffle files

● Resource wastage in having multiple

shuffle services per node

● Disk sharing between containers in a

Pod is trivial

● Can expose shuffle service on Pod IP

driver foo driver bar

foo-1

shuffle service

bar-1

shuffle service

foo-2

shuffle service

bar-2

shuffle service

Page 27: Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - Anirudh Ramanthan from Google Kubernetes Team

Resource Allocation

● Kubernetes lets us specify soft and hard limits on resources (CPU,

Memory, etc)

● Pods may be in one of 3 QoS levels○ Guaranteed

○ Burstable

○ Best Effort

● Scheduling, Pre-emption based on QoS

Page 28: Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - Anirudh Ramanthan from Google Kubernetes Team

Resource Allocation

● Today, we launch Drivers and Executors with guaranteed resources.

● In the near future:○ QoS level of executors should be decided based on a notion of priority

○ Must be able to overcommit cluster resources for Spark batch jobs and pre-empt/scale

down when higher priority jobs come in

● Schedule and execute Spark Jobs launched by the same and different

tenants fairly

Page 29: Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - Anirudh Ramanthan from Google Kubernetes Team

Extending the Kubernetes API

● Use ThirdPartyResources to extend

the API dynamically

● SparkJob can be added to the API

● SparkJob object can be written to by

the Spark Driver to allow recording

parameters

● Can perform better cluster-level

aggregation/decisions

Page 31: Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - Anirudh Ramanthan from Google Kubernetes Team

Questions?

Thank You