Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Post on 15-Aug-2015
316 Views
Preview:
Transcript
Page 1
July 2015
Scaling Spark Workloads on YARN
Boulder/Denver Big Data Shane Kumpf & Mac Moore Solu2ons Engineers, Hortonworks July 2015
Page 2
Agenda
§ Introduction – Why we love Spark, Spark Strategy, What’s Next
§ YARN: The Data Operating System § Spark: Processing Internals Review § Spark: on YARN § Demo: Scaling Spark on YARN in the cloud § Q & A
Page 2
Page 3
Made for Data Science"All apps need to get predictive at scale and fine granularity Democratizes Machine Learning"Spark is doing to ML on Hadoop what Hive did for SQL on Hadoop
Elegant Developer APIs"DataFrames, Machine Learning, and SQL
Realize Value of Data Operating System"A key tool in the Hadoop toolbox
Community"Broad developer, customer and partner interest
Why We Love Spark at Hortonworks
Storage
YARN: Data Operating System
Governance Security
Operations
Resource Management
Page 4
Hadoop/YARN Powered data operating system"100% open source, multi-tenant data platform for any application, any dataset, anywhere." Built on a centralized architecture of shared enterprise services • Scalable Tiered Storage • Resource and workload management • Trusted data governance and metadata management • Consistent operations • Comprehensive security • Developer APIs and tools
Data Operating System: Open Enterprise Hadoop
Page 5
Themes for Spark Strategy
Spark is made for Data Science • Lead in the community for ML optimization • Data Science theme of Spark Summit / Hadoop Summit Provide Notebooks for data exploration & visualization • iPython Ambari Stack • Zeppelin – we’re very excited about this project Process more Hadoop data efficiently in Spark • Hive/ORC data delivered, HBase work in progress Innovate at the core • Security, Spark on YARN improvements and more
Page 6
Current State of Security in Spark
Only Spark on YARN supports Kerberos today • Leverage Kerberos for authentication
Spark reads data from HDFS & ORC • HDFS file permissions (& Ranger integration) applicable to Spark jobs
Spark submits job to YARN queue • YARN queue ACL (& Ranger integration) applicable to Spark jobs
Wire Encryption • Spark has some coverage, not all channels are covered
LDAP Authentication • No Authentication in Spark UI OOB, supports filter for hooking in LDAP
Page 7
What about ORC support?
ORC – Optimized Row Columnar format ORC is an Apache TLP providing columnar storage for Hadoop
Spark ORC Support • ORC support in HDP/Spark since 1.2.x – (Alpha) • ORC support merged into Apache Spark in 1.4
• Joint blog with Databricks @ hortonworks.com • Changes between ORC 1.3.1 and Spark 1.4.1
• ORC now uses standard API to read/write.
orc.apache.org
Page 8
Introducing Apache Zeppelin…
Page 9
Apache Zeppelin
Features
• A web-based notebook for interactive analytics
• Ad-hoc experimentation with Spark, Hive, Shell, Flink, Tajo, Ignite, Lens, etc
• Deeply integrated with Spark and Hadoop • Can be managed via Ambari Stacks
• Supports multiple language backends • Pluggable “Interpreters”
• Incubating at Apache • 100% open source and open community
Use Cases
• Data exploration & discovery
• Visualization - tables, graphs, charts
• Interactive snippet-at-a-time experience
• Collaboration and publishing
• “Modern Data Science Studio”
Page 10
Where can I find more?
• Arun Murthy’s Keynote at Hadoop Summit & SparkSummit – Hadoop Summit (http://bit.ly/1IC1BEG) – Spark Summit (http://bit.ly/1M7qw47)
• DataScience with Spark & Zeppelin Session at Hadoop Summit – http://bit.ly/1DdKeTs
• DataScience with Spark + Zeppelin Blog – http://bit.ly/1HFd545
• ORC Support in Spark Blog – http://bit.ly/1OkA1uU
Page 11
YARN: The Data Operating System
2015
Page 12
YARN Introduction
The Architectural Center • YARN moved Hadoop “beyond batch”; run batch, interactive,
and real time applications simultaneously on shared hardware. • Intelligently places workloads on cluster members based on
resource requirements, labels, and data locality. • Runs user code in containers, providing isolation and lifecycle
management.
Hortonworks Data PlaBorm 2.2
YARN: Data Operating System (Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
Apa
che
Pig
° °
° °
° ° °
° ° °
HDFS (Hadoop Distributed File System)
GOVERNANCE BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
Apache Falcon
Apa
che
Hiv
e C
asca
ding
A
pach
e H
Bas
e A
pach
e A
ccum
ulo
Apa
che
Sol
r A
pach
e S
park
Apa
che
Sto
rm
Apache Sqoop
Apache Flume
Apache Kafka
SECURITY
Apache Ranger
Apache Knox
Apache Falcon
OPERATIONS
Apache Ambari
Apache Zookeeper
Apache Oozie
Page 13
YARN Architecture - Overview
Resource Manager • Global resource scheduler
Node Manager • Per-machine agent • Manages the life-cycle of container & resource
monitoring Container
• Basic unit of allocation • Fine-grained resource allocation across multiple
resource types (memory, cpu, future: disk, network, gpu, etc.)
Application Master • Per-application master that manages application
scheduling and task execution • E.g. MapReduce Application Master
Page 14
YARN Concepts
• Application – Application is a job or a long running service submitted to YARN – Examples:
– Job: Map Reduce Job
– Service: HBase Cluster
• Container – Basic unit of allocation
– Map Reduce map or reduce task
– HBase HMaster or Region Server
– Fine-grained resource allocations – container_0 = 2GB, 1CPU
– container_1 = 1GB, 6 CPU
– Replaces the fixed map/reduce slots from Hadoop 1
14
Page 15
YARN Resource Request
15
Resource Model • Ask for a specific amount of resources (memory,
CPU, etc.) on a specific machine or rack • Capabilities define how much memory and CPU is
requested. • Relax Locality = false to force containers onto
subsets of machines aka YARN node labels.
ResourceRequest
priority
resourceName
capability
numContainers
relaxLocality
Page 16
YARN Capacity Scheduler
Page 16
• Elasticity • Queues to subdivide resources • Job submission Access Control Lists
Capacity Sharing
FUNCT
ION
• Max capacity per queue • User limits within queue • Preemption
Capacity Enforcement
FUNCT
ION
• Ambari Capacity Scheduler View AdministraWon
FUNCT
ION
Page 17
Hierarchical Queues
17
root
Adhoc 10%
DW 70%
Mrk2ng 20%
Dev 10%
Reserved 20%
Prod 70%
Prod 80%
Dev 20%
P0 70%
P1 30%
Parent
Leaf
Page 18
YARN capacity scheduler helps manage resources across the cluster
Page 19
NodeManager NodeManager NodeManager NodeManager
Container 1.1
Container 2.4
NodeManager NodeManager NodeManager NodeManager
NodeManager NodeManager NodeManager NodeManager
Container 1.2
Container 1.3
AM 1
Container 2.2
Container 2.1
Container 2.3
AM2
YARN Application Submission - Walkthrough
Client2
ResourceManager
Scheduler
Page 20
Spark: Processing Internals Review
2015
Page 21
First, a bit of review - What is Spark?
• Distributed runtime engine for fast large scale data processing.
• Designed for iterative computations and interactive data mining.
• Provides a API framework to support In-Memory Cluster Computing.
• Multi-language support – Scala, Java, Python
Page 22
So what makes Spark fast? Data access methods are not equal!
Page 23
MapReduce vs Spark
• MapReduce – On disk
• Spark – In memory
Page 24
RDD – The main programming abstraction
Resilient Distributed Datasets • Collections of objects spread
across a cluster, cached or stored in RAM or on Disk
• Built through parallel transformations
• Automatically rebuilt on failure • Immutable, each transformation
creates a new RDD
Operations • Lazy Transformations"
(e.g. map, filter, groupBy) • Actions"
(e.g. count, collect, save)
Page 25
RDD In Action
RDDRDDRDDRDD
Transformations
Action Value
linesWithSpark = textFile.filter(lambda line: "Spark” in line) !
linesWithSpark.count()!74!!linesWithSpark.first()!# Apache Spark!
textFile = sc.textFile(”SomeFile.txt”) !
Page 26
RDD Graph
map map reduceByKey collect textFile
.flatMap(line=>line.split(" "))
.reduceByKey(_ + _, 3)
.collect()
RDD[String]
RDD[List[String]]
RDD[(String, Int)]
Array[(String, Int)]
RDD[(String, Int)] .map(word=>(word, 1)))
Page 27
DAG Scheduler
map map reduceByKey collect textFile
map
Stage 2 Stage 1
map reduceByKey collect textFile
Goals • Split graph into stages
based on the types of transformations
• Pipe-line narrow transformations (transformations without data movement) into a single stage
Page 28
DAG Scheduler - Double Click
map
Stage 2 Stage 1
map reduceByKey collect textFile
Stage 2 Stage 1
Stage 1 1. Read HDFS split 2. Apply both maps 3. Write shuffle data
Stage 2 1. Read shuffle data 2. Final reduce 3. Send result to driver
Page 29
Tasks – How work gets done
Execute task
Fetch input
Write output
The fundamental unit of work in Spark 1. Fetch input based on the InputFormat or a shuffle. 2. Execute the task. 3. Materialize task output via shuffle, write, or a result to
the driver.
Page 30
Input Formats control task input
• Hadoop InputFormats control how data on HDFS is read into each task. – Controls Splits – how data is split up – each task (by default) gets one split, which is typically
a single HDFS block – Controls the concept of a Record – is a record a whole line? A single word? An XML
element? • Spark can use both the old and new API InputFormats for creating RDD.
– newAPIHadoopRDD and hadoopRDD – Save time, use Hadoop InputFormats versus writing a custom RDD
Page 30
Page 31
Executor – The Spark Worker
Isolation for tasks 1. Each application gets it’s own executors. 2. Executors run tasks in threads and cache data. 3. Run in separate processes for isolation. 4. Lives for the duration of the application.
Page 32
Executor – The Spark Worker
Execute task Fetch input
Write output
Execute task Fetch input
Write output
Execute task Fetch input
Write output Execute task
Fetch input
Write output Execute task
Fetch input
Write output
Execute task Fetch input
Write output
Execute task Fetch input
Write output
Core 1
Core 2
Core 3
task task
task task
task task task
EXECUTOR!
Page 33
The gangs all here
Application Master
Spark Driver
Executor
Worker Node
Task
RDD Partition
Cache
Task
RDD Partition
Executor
Worker Node
Task
RDD Partition
Cache
Task
RDD Partition
Executor
Worker Node
Task
RDD Partition
Cache
Task
RDD Partition
Executor
Worker Node
Task
RDD Partition
Cache
Task
RDD Partition
Page 34
Spark: on YARN
2015
Page 35
Spark on YARN
Modus Operandi • 1 executor = 1 yarn container • 2 modes: yarn-client or yarn-cluster • yarn-client = driver on the client side – good for the REPL • yarn-cluster = driver inside the YARN application master
(below) – good for batch and automated jobs
YARN RM
App Master
Monitoring UI
Page 36
Why Spark on YARN
Core Features • Run other workloads along with Spark • Leverage Spark Dynamic Resource Allocation • Currently the only way to run in a kerberized environment • Ability to provide capacity guarantees via Capacity Scheduler
Hortonworks Data PlaBorm 2.2
YARN: Data Operating System (Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
Apa
che
Pig
° °
° °
° ° °
° ° °
HDFS (Hadoop Distributed File System)
GOVERNANCE BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
Apache Falcon
Apa
che
Hiv
e C
asca
ding
A
pach
e H
Bas
e A
pach
e A
ccum
ulo
Apa
che
Sol
r A
pach
e S
park
Apa
che
Sto
rm
Apache Sqoop
Apache Flume
Apache Kafka
SECURITY
Apache Ranger
Apache Knox
Apache Falcon
OPERATIONS
Apache Ambari
Apache Zookeeper
Apache Oozie
Page 37
Executor Allocations on YARN
Static Allocation • Static number of executors started on the cluster. • Executors live for the duration of the application,
even when idle. Dynamic Allocation • Minimal number of executors started initially. • Executors added exponentially based on pending
tasks. • After an idle period, executors are stopped and
resources are returned to the resource pool.
Page 38
Static Allocation Details
Static Allocation • Traditional means of starting executors on nodes.
spark-shell --master yarn-client \ --driver-memory 3686m \ --executor-memory 17g \ --executor-cores 7 \ --num-executors 7
• Static number of executors specified by the submitter. • Size and count of executors is key for good
performance.
Page 39
Dynamic Allocation Details
Dynamic Allocation • Scale executor count based on pending tasks
spark-shell --master yarn-client \ --driver-memory 3686m \ --executor-memory 3686m \ --executor-cores 1 \ --conf "spark.dynamicAllocation.enabled=true" \ --conf "spark.dynamicAllocation.minExecutors=1" \ --conf "spark.dynamicAllocation.maxExecutors=100" \ --conf "spark.shuffle.service.enabled=true"
• Minimum and maximum number of executors specified.
• Exclusive to running Spark on YARN
Page 40
Enabling Dynamic Allocation
spark_shuffle YARN aux service Dynamic allocation is not enabled OOTB
--conf "spark.dynamicAllocation.enabled=true" \ --conf "spark.shuffle.service.enabled=true"
1. Copy the spark-shuffle jar onto the NodeManager classpath.
2. Configure the YARN aux service for spark_shuffle
Add: spark_shuffle to yarn.nodemanager.aux-services Add: yarn.nodemanager.aux-service.spark_shuffle.class =
Org.apache.spark.network.yarn.YarnShuffleService
3. Restart the NodeManagers to pick up the spark-shuffle jar.
4. Run the spark job with the dynamic allocation configs.
Page 41
Dynamic Allocation Configuration Options
spark.dynamicAllocation.minExecutors Minimum number of executors, also the initial number to be spawned at
job submission. (can override initial count with initialExecutors) --conf "spark.dynamicAllocation.minExecutors=1”
spark.dynamicAllocation.maxExecutors Maximum number of executors, executors will be added
based on pending tasks up to this maximum. --conf "spark.dynamicAllocation.maxExecutors=100”
Page 42
Dynamic Allocation Configuration Options
spark.dynamicAllocation.sustainedSchedulerBacklogTimeout After the initial round of executors are scheduled, how long until the next
round of scheduling? Default: 5 seconds.
--conf "spark.dynamicAllocation.schedulerBacklogTimeout=10”
spark.dynamicAllocation.schedulerBacklogTimeout Initial Delay to wait before allocating additional executors.
Default: 5 seconds
--conf "spark.dynamicAllocation.sustainedSchedulerBacklogTimeout=10”
E
Executors Started over Time
EE
E
E E
E E
E
E
E
E
E
E
E
E
Page 43
Dynamic Allocation – Good citizenship in a shared environment
spark.dynamicAllocation.executorIdleTimeout Amount of idle time in seconds before a executor container is
killed and resource returned to YARN. Default: 10 minutes --conf "spark.dynamicAllocation.executorIdleTimeout=60”
spark.dynamicAllocation.cachedExecutorIdleTimeout Because caching RDDs is key to performance, this setting has been
introduced to keep executors with cached data around longer.
--conf "spark.dynamicAllocation.cachedExecutorIdleTimeout=1800”
Page 44
Sizing your Spark job
Difficult Landscape • Conflicting recommendations often found online. • Need knowledge of the data set, task distribution,
cluster topology, RDD cache churn, hardware profile….
1 executor per core?
It Depends
1 executor per node?
3-5 executors if I/O bound?
yarn.nodemanager.resource.memory-mb?
18gb max heap?
Page 45
Commons Suggestions to improve performance
Do these things 1. Cache RDDs in memory* 2. Don’t spill to disk if possible 3. Use a better serializer 4. Consider compression 5. Limit GC activity 6. Get parallelism right*
1. … or scale elastically
* New considerations with Spark on YARN
Page 46
Sizing Spark Executors on YARN
Relationship 1. Setting the executor memory size is setting the JVM heap, NOT the container. 2. Executor memory + the greater of (10% or 384mb) = container size. 3. To avoid wasted resources, ensure Executor memory + memoryOverhead <
yarn.scheduler.minimum-allocation-mb
Page 47
Sizing Spark Executors on YARN
Relevant YARN Container Settings • yarn.nodemanager.resource.cpu-vcores
– Number of vcores availble for YARN containers per nodemanager • yarn.nodemanager.resource.memory-mb
– Total memory available for YARN containers per nodemanager • yarn.scheduler.minimum-allocation-mb
– Minimum resource request allowed per allocation in megabytes. – Smallest container available for an executor
• yarn.scheduler.maximum-allocation-mb – Maximum resource request allowed per allocation in megabytes. – Largest container available for an executor – Typically equal to yarn.nodemanager.resource.memory-mb
Page 48
Tuning Advice
How do we get it right? • Test, gather, and test some more • Define a SLA! • Tune the job, not the cluster • Tune the job to meet SLA! • Don’t tune prematurely, it’s the root of all evil
Starting Points • Keep your heap reasonable, but large enough to
handle your dataset. – Recall that we only get about 60% of the heap for
RDD caching. – Measure GC and ensure the percent of time spent
here is low. • For jobs that heavily depend on cached RDDs,
limit executors per machine to one where possible – See the first point, if RDD cache churn or GC are a
problem, make smaller executors and run multiple per machine.
Starting Points • High memory hardware, multiple executors per
machine. – Keep the heap reasonable
• For CPU bound tasks with limited data needs, more executors can be better
– Run with 2-4GB executors with a single vcore and measure performance.
• Tune task parallelism – As a rule of thumb, increase the task count by 1.5x
each round of testing and measure the results.
Page 49
Avoid spilling or caching to disk
Caching strategies • Use the default .cache() or .persist() which stores data as deserialized java
objects (MEMORY_ONLY). – Trade off: Lower CPU usage versus size of data in memory.
• Don’t use disk persistence. – It’s typically faster to recompute the partition and there is a good chance many of the
blocks are still in the Operating System page cache. • If the default strategy results in the data not fitting in memory, use
MEMORY_ONLY_SER, which stores the data as serialized objects. – Trade off: Higher CPU usage but data set is typically around 50% smaller in memory. – Can result in significant impacts to the job run time for larger data sets, use with caution.
import org.apache.spark.storage.StorageLevel._ theRdd.persist(MEMORY_ONLY_SER)
Page 50
Data Access with Spark on YARN
Gotchas • Don’t cache base RDDs, poor distribution.
– Do cache intermediate data sets, good distribution across dynamically allocated executors.
• Ensure executors remain running until you are done with the cached data. – Cached data goes away when the executors do, costly to recompute.
• Data locality is getting better, but isn’t great. – SPARK-1767 introduced locality waits for cached data.
• computePreferredLocations is pretty broken. – Only use if necessary, gets overwritten in some scenarios, better
approaches in the works.
val locData = InputFormatInfo.computePreferredLocations(Seq( new InputFormatInfo(conf, classOf[TextInputFormat], new Path("myfile.txt"))) val sc = new SparkContext(conf, locData)
Page 51
Future Improvements for Spark on YARN
RDD Sharing – Short term: Keep around executors with RDD cache longer – HDFS Memory Tier for RDD caching – Experimental Off-heap caching in Tachyon (lower overhead than persist()) – Cache rebalancing
Data Locality for Dynamic Allocation – No more preferredLocations, discover locality from RDD lineage.
Container/Executor Sizing – Make it easier… automatically determine the appropriate size. – Long term: specify task size only and memory, cores, and overhead are determined
automatically. Secure All The Things!
– SASL for shuffle data – SSL for the HTTP endpoints – Encrypted Shuffle – SPARK-5682
Page 52
DEMO: Scaling Spark workloads on YARN
2015
Page 53
Scaling compute independent of storage
HDP 2.3 Hadoop Cluster
Storage Nodes
Storage Node
NodeMgr
HDFS
Storage Node
NodeMgr
HDFS
Storage Node
NodeMgr
HDFS
Edge Node
Clients
Compute Node
NodeMgr
Compute Node
NodeMgr
Compute Node
NodeMgr
Compute Node
NodeMgr
Compute Nodes
Mgmt & Master Nodes
Ambari Node
Ambari
Master Node
Masters
Master Node
Masters
Master Node
Masters
Overview 1. Pattern that is gaining
popularity in the cloud. 2. Save costs and leverage the
elasticity of the cloud. 3. Scale NodeManagers
(compute only) independent of traditional Nodemanager/Datanode (compute + storage) workers.
Page 54
How it works?
Overview 1. Leverage Spark Dynamic
Allocation on YARN to scale number of executors based on pending work.
2. If additional capacity is still needed, provision additional compute nodes, add them to the cluster, and continue to scale executors onto the new nodes.
HDP 2.3 Hadoop Cluster
Storage Nodes
Storage Node
NodeMgr
HDFS
Storage Node
NodeMgr
HDFS
Storage Node
NodeMgr
HDFS
Compute Node
NodeMgr
Compute Node
NodeMgr
Compute Node
NodeMgr
Compute Node
NodeMgr
Compute Node
NodeMgr
Compute Node
NodeMgr
Compute Node
NodeMgr
Compute Node
NodeMgr
Compute Nodes
Edge Node
Clients
Mgmt & Master Nodes
Ambari Node
Ambari
Master Node
Masters
Master Node
Masters
Master Node
Masters
Page 55
HDP/Spark ClusterCloudbreak
Ambari
Orchestration(REST API) Metrics
Spark Client
Compute Nodes
Container
Executor
Container
Executor
Container
Executor
Container
Executor
Process Overview
+Container
Executor
Container
Executor
+
More Compute!
Container
Executor
Container
Executor
Container
Executor
Container
Executor
1 Deploy Cluster
2 Set Alerts
3 Submit Job4 Executors Increase
5 Capacity reached, Alerts trigger
6 Scaling Policy adds compute nodes
Page 56
DEMO – Leveraging Dynamic Allocation
Page 57
Scenarios
Promising Use Cases 1. CPU bound workloads 2. Burst-y usage 3. Zeppelin/ad-hoc data exploration 4. Multi-tenant, multi-use, centralized cluster 5. Dev/QA clusters
Page 58
Cloudbreak
• Developed by SequenceIQ • Open source with options to extend
with custom UI • Launches Ambari and deploys
selected distribution via Blueprints in Docker containers
• Customer registers, delegates access to cloud credentials, and runs Hadoop on own cloud account (Azure, AWS, etc.)
• Elastic – Spin up any number of nodes, up/down scale on the fly
“Cloud agnostic Hadoop As-A-Service API”
Page 59
BI / AnalyWcs (Hive)
IoT Apps (Storm, HBase, Hive)
Launch HDP on Any Cloud for Any Application
Dev / Test (all HDP services)
Data Science (Spark)
Cloudbreak 1. Pick a Blueprint 2. Choose a Cloud 3. Launch HDP!
Example Ambari Blueprints: IoT Apps, BI / Analy2cs, Data Science,
Dev / Test
Page 60
Step 1: Sign up for a free Cloudbreak account
Page 60
URL to sign up for a free account:"https://accounts.sequenceiq.com/ ""General Cloudbreak documentation:"http://sequenceiq.com/cloudbreak/#cloudbreak
Page 61
• Varies by cloud, but typically only a couple of steps.
Page 61
Step 2: Create or add credentials
Page 62
Step 3: Note the blueprint for your use case
• An Ambari blueprint describes components of the HDP stack to include in the cloud deployment
• Cloudbreak comes with some default blueprints, such as a Spark cluster or a streaming architecture
• Pick the appropriate blueprint, or create your own!
Page 62
Page 63
Step 4: Create Cluster
• Ensure your credential is selected by clicking on “select a credential”
• Click Create cluster, give it a name, choose a region, choose a network
• Choose desired blueprint
• Set the instance type and number of nodes.
• Click create and start cluster
Page 63
Page 64
Step 5: Wait for cluster install to complete
• Depending on instance types and blueprint chosen, cluster install should complete in 10-35 mins
• Once cluster install is complete, click on the Ambari server address link (highlighted on screenshot) and login to Ambari with admin/admin
• Your HDP cluster is ready to use
Page 64
Page 65
Periscope: Auto up and down scaling
• Define alerts for the number of pending YARN containers.
Page 65
Page 66
Periscope: Auto up and down scaling
• Define scaling policies for how Periscope should react to the defined alerts.
Page 66
Page 67
Periscope: Auto up and down scaling
• Define the min/max cluster size and “cooldown” period (how long to wait between scaling events).
Page 67
• The number of compute nodes will automatically scale when out of capacity for containers.
Page 68
Benefits
Why do I care? • Less contention between jobs
– Less waiting for your neighbors job to finish, elastic scale gives us all compute time.
• Improved job run times. – Testing has shown a 30%+ decrease in job run times for moderate
duration CPU bound jobs. • Decreased costs over persistent IaaS clusters
– Spin down resources not in use. – If time = money, improve job run times will decrease costs.
• Capacity planning hack! – Scaling up a lot? You should probably add more capacity… – Never scaling up? You probably overbuilt…
Page 69
DEMO – Auto Scaling IaaS
Page 70
Q & A
top related