Morpheus: Towards Automated SLOs for Enterprise Clusters · 2019-12-18 · Morpheus: Towards Automated SLOs for Enterprise Clusters Sangeetha Abdu Jyothi* Carlo Curino Ishai Menache

Morpheus:Towards Automated SLOs for Enterprise Clusters

Sangeetha Abdu Jyothi* Carlo Curino Ishai Menache Shravan Matthur Narayanamurthy Alexey Tumanov^

Jonathan Yaniv** Ruslan Mavlyutov^^ Íñigo Goiri Subru Krishnan Janardhan Kulkarni Sriram Rao

* University of Illinois, Urbana-Champaign ^ University of California, Berkeley

** Technion - Israel Institute of Technology ^^ University of Fribourg

†

† † †

† †††

Operator/User tensions

Operator User

Res

ourc

es

TimeTime

Res

ourc

es

Container

• Run as many jobs as possible• Fair-sharing

• Our focus is on batch jobs in big data enterprise clusters

• Periodic jobs should run predictably –output available by deadline

T T

2

Reso

urce

s

Time

Roadblock: Unpredictability

Sharing-inducedresource-sharing, queueing etc

Inherentstragglers, failures, skew, hardware changes

25% of user tickets due to unpredictability

0.6 0.8

1 1.2 1.4 1.6 1.8

2

Q1 Q3 Q4 Q6 Q12

Rela

tive

runt

ime

275-node cluster, TPC-H queries on 10TBdeadline

3

deadline

Current “solution”: Over-provisioning

Prov./ average

Prov./ peak

50-k node COSMOS cluster

4

Users over-provision > 75% jobs

Utilization vs. Predictability

5

Towards automated SLOs

System focuses on periodic jobs

Empirically >60% are periodic

Our results:

5-13x reduction in deadline SLO violations

Reduce cluster size by 14-28%

6

Morpheus Overview

Respond to unpredictabilities

Quantify user requirements

Pack jobs efficiently

User sign-off

Monitoring

Logs

SLOresource estimate

Automatic Inference Module

Reservation Mechanism

Dynamic Reprovisioning

7





Deadlines

Resource Estimate

Logs

Quantify user requirements

Derive deadline SLOs

Estimate job resource demands

8

User sign-off

Deadline SLOs

A B

Y

ZX

Y

ZX

�

��

��

��

��

�

� � � � � ��

��

��

Job completion time of AOutput consumption time of B

deadline 9

Deadline SLO validation

P B#$%& A)%**+,-> 4×P B#$%& A)112+,-)

1.0

0.8

0.6

0.4

0.2

0.01 0.1 1 10 100 1000Sparetimebeforedeadline(normalizedbyjobduration)

CDFoverjobs

~70% of jobs have high scheduling flexibility Valid estimate

A B

arrival deadline

10

Job Resource Demand

• Usage patterns (container skylines) of multiple instances of the same job

• Generate the best fitting model using Linear Program

• Fitting controlled by a parameter, ⍺ (higher ⍺à less resources)

• Other alternatives – Jockey [Eurosys‘12], PerfOrator [SoCC’16]11





LCM

LowCostSLO

resource estimate

Pack jobs efficiently

Compact storage of jobs based on Least Common Multiple (LCM) of periods

LowCost Packing Algorithm12

User sign-off

LCM Representation

Job A

Job C

Entire plan

LCM

Job B

Smallest repeating unit stored – Least Common Multiple (LCM) of periods

Efficient storage

Predictable allocation for users

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

1' 2' 5' 10'

15'

30'

45'

1h1.

5h 2h 3h 4h 6h 8h 12h 1d 2d 3d 4d 1w

port

ion

of t

otal

(%

)

periodicity

periodic jobs

instances

13

Other key techniques (in the paper)

LowCost Packing Algorithm

Heuristic for achieving a balanced allocation

Load-aware online packing

Dynamic reprovisioning

Continuous monitoring of jobs

Allocate more resources when “progress” is slow

14

deadlinearrival

new job

Time

Reso

urce

s

Experiments

Implementation:

Recurrent reservation mechanism, packing algorithm, and dynamic reprovisioning in Apache Hadoop/YARN

Stand-alone inference subsystem

Workload:

Enterprise-trace: Three-month trace from 50k-node COSMOS cluster

Hadoop-trace: One-month trace from 4k-node Hadoop cluster

TPC-H: Standard TPC-H benchmark

15

Evaluation – Scalability test

Morpheus can handle load in production clusters

time(hours)0 1 2 3 4 5 6 7 8

time(hours)0 1 2 3 4 5 6 7 80

200

400

600

800

rese

rvat

ions

0

20

40

60

80

mem

ory

(TB)

2700-node cluster with 92 TB memory

allo

cate

d m

emor

y (T

B)

16

Provisioned / used resources

CD

F

Evaluation – Resource estimation

Morpheus provides more accurate resource estimates

Level of fitting controllable in the inference module of Morpheus

Higher ⍺à Tighter fitting à Less over-provisioning

17

Nor

mal

ized

SLO

Vio

latio

ns

Baseline (user provisioned)

�

��

��

��

��

�

��

��

� ��

Normalized Cluster Resources

�

��

��

��

��

�

��

��

� ��

⍺ = 1%, dynamic�

��

��

��

��

�

��

��

� ��

⍺ = 1%, static

�

��

��

��

��

�

��

��

� ��

⍺ = 5%, static

⍺ = 5%, dynamic

Evaluation

18

Conclusion• Predictable performance with lesser resources and higher utilization

• Three main ideas

• Automatic inference

• Recurrent reservations

• Dynamic reprovisioning

• 5-13x reduction in SLO violations

• 14-28% reduction in cluster size

19

THANK YOU!

20

Morpheus: Towards Automated SLOs for Enterprise Clusters · 2019-12-18 · Morpheus: Towards Automated SLOs for Enterprise Clusters Sangeetha Abdu Jyothi* Carlo Curino Ishai Menache

Documents