Elastic Resource Scheduling with Apache Mesos Sharma Podila Aug 31st, MesosCon Europe 2016
Elastic Resource Scheduling with Apache Mesos
Sharma PodilaAug 31st, MesosCon Europe 2016
them Wisely.
Finite. Let’s use
Computing Resources are
Computing Resources arethem Wisely. Let’s schedule
Finite. Let’s useth
em o
ptim
ally
.
About Me
● Software engineer○ Netflix Edge Engineering○ Sun Microsystems + Oracle Corp.○ Resource scheduling, stream processing,
distributed systems
● Author of Fenzo scheduling library
● Why Apache Mesos?
● Why focus on scheduling?
● How to guarantee capacity for various apps?
● What’s needed from the container executor?
Let’s address a few questions
Source: https://www.sandvine.com/news/global_broadband_trends.asp
81 Million subscribers worldwide and growing!
Microservices architecture on EC2
Why Apache Mesos?
Needed to build these...
Needle in a haystack anomaly detection
Needed to build these...
Needle in a haystack anomaly detection
Container deployment service for a mix of batch and service workloads
Reactive stream processing: Mantis
Zuul Cluster
API Cluster
MantisStream processing
Cloud native service
● Configurable message delivery guarantees● Heterogeneous workloads
○ Real-time dashboarding, alerting○ Anomaly detection, metric generation○ Interactive exploration of streaming data
AnomalyDetection
EC2
VPC
VMVM
Titu
s Jo
b C
ontro
l
Containers
AppCloud Platform
(metrics, IPC, health)
VMVM
BatchContainers
Eureka Edda
Container deployment: Titus
Atlas & Insight
A few common themes
Large variation in peak to trough resource requirements
Mantis events/sec
8M
2M
Titus concurrent jobs
1000s
10s
A few common themes
Heterogeneous mix of jobs and resources
Resource Task request Agent sizes
CPU 1 - 32 CPUs 8 - 32 CPUs
Memory 2 - 200+ GB 32 - 244 GB
Network bandwidth
10 - 1024 Mbps 1024 - 10240
Resource affinity based on task typeTask locality
A few common themes
Jobs needing high availability of tasks across ephemeral cloud resources
Host1ec2 zone=d
Host2ec2 zone=e
Host3ec2 zone=f
Job with N tasks
What kind of scheduler do I need?
Scheduler
Cluster wide optimizations:#servers, heterogeneous mix, security
User centric optimizations:Resource affinity, task locality
Assignments
Achieve multiple scheduling objectives
Functions of a framework
Framework
AP
I Resource Scheduling
Persistence
Domain specific
Environment specific
Potentially common
NetflixOSS Fenzo scheduling libraryhttps://github.com/Netflix/Fenzo
● Heterogeneous mix of task and resource sizes● Autoscaling of Mesos agent clusters● Customizable scheduling objectives
Scheduling optimizationsSpeed Accuracy
First fit assignment Optimal assignment
Real world tradeoffs
For each task
On each host
Validate hard constraints
Eval fitness and soft constraints
Until fitness “good enough”, and
A minimum #hosts evaluated
Fenzo Scheduling strategy
= Plugins
Sample plugins: bin packing fitness function and soft/hard constraint evaluators for resource affinity and task locality
Fenzo agent cluster autoscaling
● Scaling up is relatively easy● Scaling down requires bin packing
○ By resource footprint, runtime, etc.
Host 1 Host 2 Host 3 Host 4
vs.Host 1 Host 2 Host 3 Host 4
Capacity Guarantees
Capacity guarantees
Guarantee capacity for timely job startsMesos support for quotas, etc. evolving^
Agreed upon
Capacity guarantees
Guarantee capacity for timely job startsMesos support for quotas, etc. evolving^
Agreed upon
Generally, optimize throughput for batch jobs and start latency for service jobs
Capacity guarantees
Guarantee capacity for timely job startsMesos support for quotas, etc. evolving^
Agreed upon
Some service style jobs may be less important
Categorize by expected behavior instead:Critical versus Flex (flexible scheduling requirements)
Generally, optimize throughput for batch jobs and start latency for service jobs
Capacity guarantees
Critical
Flex
Quotas
Capacity guarantees
Critical
FlexCritical
Flex
ResourceAllocationOrder
Quotas Prioritiesvs.
AppC
1
AppC
2
AppC
3
AppC
N
AppF1
AppF2
AppFN
AppF3
ResourceAllocationOrder
Capacity guarantees: hybrid view
Critical
Flex
Critical
Capacity guarantees: hybrid view
Head of line blockingWhat if ‘Critical’ task isn’t satisfied?Or, it isn’t ready?
Flex
Capacity guarantees: hybrid view
Head of line blockingWhat if ‘Critical’ task isn’t satisfied?Or, it isn’t ready?
Dynamic scheduling
Critical
Flex
Capacity guarantees: hybrid view
Head of line blockingWhat if ‘Critical’ task isn’t satisfied?Or, it isn’t ready?
Automatic advance reservationTask T2
Dynamic scheduling
T1 T2
HostA
Critical
Flex
Time
Capacity guarantees: hybrid view
Head of line blockingWhat if ‘Critical’ task isn’t satisfied?Or, it isn’t ready?
Automatic advance reservationTask T2
Dynamic scheduling
T1 T2
HostA
Critical
Flex
Time
Underutilization
Capacity guarantees: hybrid view
Head of line blockingWhat if ‘Critical’ task isn’t satisfied?Or, it isn’t ready?
Automatic advance reservationTask T2
Back filling improves utilizationTask T3
Dynamic scheduling
T1 T2
Time
T3
HostA
Critical
Flex
Capacity guarantees: “utilization”
What if ‘Critical’ is under utilizing?Let Flex use it, but …
Critical
Flex
Capacity guarantees: “utilization”
What if ‘Critical’ is under utilizing?Let Flex use it, but …
Preemptions“Fairness” via composable functions
Critical
Flex
Container Executor
Container executor
+ <MULTI-TENANT
Container executor
+ <Augment missing pieces:
IP per containerSecurity - Security Groups, IAM rolesIsolation for networking b/w, disk I/O
MULTI-TENANT
No IP Needed
Task 0
SecGrp Y
Task 1 Task 2 Task 3
docker0 (*)
EC2 VMeth0
eni0SG=Titus Agent
eth1
eni1SecGrp=X
eth2
eni2SG=Y
IP 1IP 2
IP 3
pod rootveth<id>
app
SecGrp X
pod rootveth<id>
app
SecGrp X
pod rootveth<id>
appapp
veth<id>
Linux Policy Based Routing
EC2 Metadata
Proxy
169.254.169.254IPTables NAT (*)
* **
169.254.169.254
Plumbing VPC Networking into Docker
In Summary...
Computing Resources arethem Wisely. Let’s schedule
Finite. Let’s useth
em o
ptim
ally
.
Computing Resources arethem Wisely. Let’s schedule
Finite. Let’s useth
em o
ptim
ally
.
And
, let
’s c
olla
bora
te ^
Questions?
Elastic Resource Scheduling with Apache MesosSharma Podila spodila @ netflix . com
@podila linkedin.com/in/spodila