Efficient cluster resource management using Mesos and Cook Li Jin
Jan 06, 2017
Efficient cluster resource management using Mesos and CookLi Jin
About Me• Software Engineer @ Two Sigma
Outline• Introduction: Mesos and Cook
What is Mesos• Open Source Apache Project• 2010: AMPLab, University of California Berkeley
• 2012: Twitter, Airbnb• 2015: Twitter, Airbnb, Apple, Bloomberg, Cisco,
eBay, Yelp…
What is Mesos• Tool to build distributed applications
– Hadoop, Spark…– Cassandra, Kafta, Riak…
What is Mesos• Distributed applications commonality:
– Manages resources (cpu, memory, disk…) on worker hosts
– Manages life cycle of remote processes– Manages communication between masters
and workers
What is Mesos
What is Mesos
What is Mesos
What is Mesos• Distributed applications commonality:
– Manages resources (cpu, memory, disk…) on worker hosts
– Manages life cycle of remote processes– Manages communication between masters
and workers
Mesos Primitives
Mesos @ Two Sigma
Cook
Mesos
What is Cook• Two Sigma’s Simulation Platform
• Manages tens of thousands of simulations • Shares compute resources among users
What is Simulation• Idempotent, distributed, resource intensive
computations• Simulation set• A handful ~ thousands of simulations
• Simulation• Multiple Mesos tasks
What is Simulation• Simulation task footprint• 10 ~ 100 GB RAM• 1 ~ 20 CPUs• 15 minutes ~ a few hours
• Simulation use cases• Interactive• Batch processing
Problem• High resource demand• 5 x capacity during peak hours
• Optimize• Utilization• Process workloads as fast as possible
• Fairness• Allocate resources fairly to users
What is Fairness• FIFO• Time sharing• Throw a dice• …
What is Fairness• A story…
What is FairnessResource Allocation
What is Fairness, Really• Fairness is not about ‘fair’• Fairness is about user experience
• User should get their share of the cluster whenever they need it
Outline• Introduction: Mesos and Cook• Problem: Utilization and Fairness• Fairness: How do we do it
Static Quota• Quota = Max percentage of the cluster allowed for
single user• Static
• 100 % / # Max concurrent users• Pros:
• Fairness• Cons:
• Poor Utilization
Dynamic Quota• Dynamic
• Quota * Utilization Adjustment• Pros:
• Higher Utilization• Cons:
• Poor Fairness
Dynamic Quota
Unfair Resource Allocation
Fair Resource Al-location
Hours…
Can we do better?
Static Quota Dynamic Quota ?
FairnessUtilization
Preemption• Kill a Simulation task and reschedule later• Reclaim resource faster!
Unfair Resource Allocation
Minutes!
Fair Resource Al-location
Outline• Introduction: Mesos and Cook• Problem: Utilization and Fairness• Fairness: How do we do it• Preemption: How do we do it
Preemption: Intuition
Running
Waiting
Preemption: Intuition
Running
Waiting
Preemption: Intuition
Running
Waiting
Preemption: Intuition
??
?
??
?Running
Waiting
Preemption: Intuition
Running
Waiting
Preemption: Intuition
Running
Waiting
Preemption: Intuition
Running
Waiting
Preemption: Intuition
Running
Waiting
Preemption: Intuition
Running
Waiting
Preemption: Intuition
Running
Waiting
Problem• Not all tasks are equal
• We just preempted some important tasks!
Bad User Experience
Score Function• Score Function: Reflect task’s value
• Fairness• Importance
• Preemption principal:• Preempt low score task for high score task
Preemption: Intuition
€€€€€€
Running
Waiting
££££££££££££
¥¥¥¥¥¥¥¥¥
Preemption: Intuition
₽₽₽₽₽₽
Running
Waiting
₽₽₽₽₽₽
₽₽₽₽₽₽
Preemption: Intuition
₽₽₽₽₽₽
Running
Waiting
₽₽₽₽₽₽
₽₽₽₽₽₽
Preemption: Intuition
₽₽₽₽₽₽
Running
Waiting
₽₽₽₽₽₽
₽₽₽₽₽₽
Preemption: Intuition
₽₽₽₽₽
₽
Running
Waiting
₽₽₽₽₽₽
₽₽₽
₽₽₽
Preemption: Intuition
₽₽₽₽₽
₽
Running
Waiting
₽₽₽₽₽₽
₽₽₽
₽₽₽
Preemption: Intuition
₽₽₽₽₽
₽
Running
Waiting
₽₽₽₽₽
₽
₽₽₽₽₽
₽
Preemption: Intuition
€€€€€
€
Running
Waiting
£££££££££
£££
¥¥¥¥¥¥¥
¥¥
Outline• Introduction: Mesos and Cook• Problem: Utilization and Fairness• Fairness: How do we do it• Preemption: How do we do it
• Intuition• Formalization
Cumulative Resource Share (CRS)
• Assuming there is an total order of tasks for each user, where > means ‘more important than’. – CRS of task t is sum of all tasks of the same user
that are greater than or equal to t, divided by total cluster resource.
Cumulative Resource Share (CRS)• ,
Preemption: Formalization
€€€€€€
Running
Waiting
££££££££££££
¥¥¥¥¥¥¥¥¥
Preemption: Formalization
1/62/63/6
Running
Waiting
1/62/63/6
1/62/63/6
Preemption: Formalization
1/62/63/6
Running
Waiting
1/62/63/6
1/62/63/6
Preemption: Formalization
1/62/6
3/6
Running
Waiting
1/62/63/6
1/6
2/63/6
Preemption: Formalization
1/62/6
3/6
Running
Waiting
1/62/6
3/6
1/62/6
3/6
Multiple Resources?• Dominant Resource Fairness: Fair Allocation of
Multiple Resource Types • Published by UC Berkeley in 2011
Dominant Cumulative Resource Share
Outline• Introduction: Mesos and Cook• Problem: Utilization and Fairness• Fairness: How do we do it• Preemption: How do we do it
• Intuition• Formalization
• Put things together: Mesos and Cook
Cook: Architecture
Are we doing better?
Static Quota Dynamic Quota Preemption?
FairnessUtilization
Outline• Introduction: Mesos and Cook• Problem: Utilization and Fairness• Fairness: How do we do it• Preemption: How do we do it
• Intuition• Formalization
• Put things together: Mesos and Cook• Benchmark
Benchmark• Simulated• 7 day production workload trace
Benchmark
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
2
4
6
8
10
12
Simulation Set Speed Up Distribution
Dynamic QuotaPreemption
Spee
d U
p
Benchmark
Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Day 70
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Effective Utilization
Dynamic QuotaPreemption
Util
izati
on
It works!
Open Source• https://github.com/apache/mesos• https://github.com/twosigma/cook
• @icexelloss
Questions?