Page 1
Morpheus:Towards Automated SLOs for Enterprise Clusters
Sangeetha Abdu Jyothi* Carlo Curino Ishai Menache Shravan Matthur Narayanamurthy Alexey Tumanov^
Jonathan Yaniv** Ruslan Mavlyutov^^ Íñigo Goiri Subru Krishnan Janardhan Kulkarni Sriram Rao
* University of Illinois, Urbana-Champaign ^ University of California, Berkeley
** Technion - Israel Institute of Technology ^^ University of Fribourg
†
† † †
† †††
Page 2
Operator/User tensions
Operator User
Res
ourc
es
TimeTime
Res
ourc
es
Container
• Run as many jobs as possible• Fair-sharing
• Our focus is on batch jobs in big data enterprise clusters
• Periodic jobs should run predictably –output available by deadline
T T
2
Page 3
Reso
urce
s
Time
Roadblock: Unpredictability
Sharing-inducedresource-sharing, queueing etc
Inherentstragglers, failures, skew, hardware changes
25% of user tickets due to unpredictability
0.6 0.8
1 1.2 1.4 1.6 1.8
2
Q1 Q3 Q4 Q6 Q12
Rela
tive
runt
ime
275-node cluster, TPC-H queries on 10TBdeadline
3
deadline
Page 4
Current “solution”: Over-provisioning
Prov./ average
Prov./ peak
50-k node COSMOS cluster
4
Users over-provision > 75% jobs
Page 5
Utilization vs. Predictability
5
Page 6
Towards automated SLOs
System focuses on periodic jobs
Empirically >60% are periodic
Our results:
5-13x reduction in deadline SLO violations
Reduce cluster size by 14-28%
6
Page 7
Morpheus Overview
Respond to unpredictabilities
Quantify user requirements
Pack jobs efficiently
User sign-off
Monitoring
Logs
SLOresource estimate
Automatic Inference Module
Reservation Mechanism
Dynamic Reprovisioning
7
Page 8
Automatic Inference Module
Reservation Mechanism
Dynamic Reprovisioning
Automatic Inference Module
Deadlines
Resource Estimate
Logs
Quantify user requirements
Derive deadline SLOs
Estimate job resource demands
8
User sign-off
Page 9
Deadline SLOs
A B
Y
ZX
Y
ZX
�
���
���
���
���
�
� � � � � �� �� �� ��
����������������
���� �� ��� ��� ������
Job completion time of AOutput consumption time of B
deadline 9
Page 10
Deadline SLO validation
P B#$%& A)%**+,-> 4×P B#$%& A)112+,-)
1.0
0.8
0.6
0.4
0.2
0.01 0.1 1 10 100 1000Sparetimebeforedeadline(normalizedbyjobduration)
CDFoverjobs
~70% of jobs have high scheduling flexibility Valid estimate
A B
arrival deadline
10
Page 11
Job Resource Demand
• Usage patterns (container skylines) of multiple instances of the same job
• Generate the best fitting model using Linear Program
• Fitting controlled by a parameter, ⍺ (higher ⍺à less resources)
• Other alternatives – Jockey [Eurosys‘12], PerfOrator [SoCC’16]11
Page 12
Automatic Inference Module
Reservation Mechanism
Dynamic Reprovisioning
Reservation Mechanism
LCM
LowCostSLO
resource estimate
Pack jobs efficiently
Compact storage of jobs based on Least Common Multiple (LCM) of periods
LowCost Packing Algorithm12
User sign-off
Page 13
LCM Representation
Job A
Job C
Entire plan
LCM
Job B
Smallest repeating unit stored – Least Common Multiple (LCM) of periods
Efficient storage
Predictable allocation for users
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
1' 2' 5' 10'
15'
30'
45'
1h1.
5h 2h 3h 4h 6h 8h 12h 1d 2d 3d 4d 1w
port
ion
of t
otal
(%
)
periodicity
periodic jobs
instances
13
Page 14
Other key techniques (in the paper)
LowCost Packing Algorithm
Heuristic for achieving a balanced allocation
Load-aware online packing
Dynamic reprovisioning
Continuous monitoring of jobs
Allocate more resources when “progress” is slow
14
deadlinearrival
new job
Time
Reso
urce
s
Page 15
Experiments
Implementation:
Recurrent reservation mechanism, packing algorithm, and dynamic reprovisioning in Apache Hadoop/YARN
Stand-alone inference subsystem
Workload:
Enterprise-trace: Three-month trace from 50k-node COSMOS cluster
Hadoop-trace: One-month trace from 4k-node Hadoop cluster
TPC-H: Standard TPC-H benchmark
15
Page 16
Evaluation – Scalability test
Morpheus can handle load in production clusters
time(hours)0 1 2 3 4 5 6 7 8
time(hours)0 1 2 3 4 5 6 7 80
200
400
600
800
rese
rvat
ions
0
20
40
60
80
mem
ory
(TB)
2700-node cluster with 92 TB memory
allo
cate
d m
emor
y (T
B)
16
Page 17
Provisioned / used resources
CD
F
Evaluation – Resource estimation
Morpheus provides more accurate resource estimates
Level of fitting controllable in the inference module of Morpheus
Higher ⍺à Tighter fitting à Less over-provisioning
17
Page 18
Nor
mal
ized
SLO
Vio
latio
ns
Baseline (user provisioned)
�
���
���
���
���
�
���
���
� ��� ��� ��� ��� �
Normalized Cluster Resources
�
���
���
���
���
�
���
���
� ��� ��� ��� ��� �
⍺ = 1%, dynamic�
���
���
���
���
�
���
���
� ��� ��� ��� ��� �
⍺ = 1%, static
�
���
���
���
���
�
���
���
� ��� ��� ��� ��� �
⍺ = 5%, static
⍺ = 5%, dynamic
Evaluation
18
Page 19
Conclusion• Predictable performance with lesser resources and higher utilization
• Three main ideas
• Automatic inference
• Recurrent reservations
• Dynamic reprovisioning
• 5-13x reduction in SLO violations
• 14-28% reduction in cluster size
19