Top Banner
On the diversity of cluster workloads and its impact on research results George Amvrosiadis, Jun Woo Park, Greg Ganger, Garth Gibson, Elisabeth Baseman, Nathan DeBardeleben
17

On the diversity of cluster workloads and its impact on ... › sites › default › files › conference › protected … · On the diversity of cluster workloads and its impact

Jun 25, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: On the diversity of cluster workloads and its impact on ... › sites › default › files › conference › protected … · On the diversity of cluster workloads and its impact

On the diversity of cluster workloads and its impact on research results

George Amvrosiadis, Jun Woo Park, Greg Ganger, Garth Gibson, Elisabeth Baseman, Nathan DeBardeleben

Page 2: On the diversity of cluster workloads and its impact on ... › sites › default › files › conference › protected … · On the diversity of cluster workloads and its impact

Sources for cluster traces today

• Parallel Workload Archive (1993 – 2015)

• 38 HPC cluster traces

(each: 1K+ cores, months long)

• Publications: 250+

• Google cluster trace (2011)

• 29 days of a 12,000-node cluster

• Publications: 450+

1 www.pdl.cmu.edu/ATLAS Google trace: exceedingly popular, but how representative of other clusters?

Page 3: On the diversity of cluster workloads and its impact on ... › sites › default › files › conference › protected … · On the diversity of cluster workloads and its impact

Project Atlas

• Mandate: use historical data to improve cluster efficiency

• LANL: scheduler logs, sensor data, OS logs, … → TBs / day

• Recently: data from Two Sigma, Pittsburgh Supercomputing Center

Current goals:

• Investigate overfitting to existing traces in systems literature

• Produce generalizable models of cluster workloads

• Create trace repository and make data publicly available

2 www.pdl.cmu.edu/ATLAS

Page 4: On the diversity of cluster workloads and its impact on ... › sites › default › files › conference › protected … · On the diversity of cluster workloads and its impact

Atlas repository: current traces

• Two Sigma business analytics clusters: 9 months (2016-2017)

• 1300 nodes, 31500 cores, 328TB RAM

• LANL Mustang general-purpose cluster: 5 years (2011-2016)

• 1600 nodes, 38400 cores, 100TB RAM

• LANL OpenTrinity capability cluster: 3 months (2017)

• Trinity phase 1: 9400 nodes, 300000 cores, 1.15PB RAM

3 www.pdl.cmu.edu/ATLAS

Entire

cluster lifetime

Repository accessible thru project-atlas.org

More traces coming soon! You can contribute!

Page 5: On the diversity of cluster workloads and its impact on ... › sites › default › files › conference › protected … · On the diversity of cluster workloads and its impact

Overview

www.pdl.cmu.edu/ATLAS 4

Characteristic Google Two Sigma Mustang OpenTrinity

Short jobs

Small jobs

Diurnal patterns

High job submission rate

Resource over-commitment

Sub-second interarrival periods

User request variability

High failure rates

Costly failures (wasted CPU hours)

Longer/larger jobs fail more often

Failure analysis

Resource utilization

Workload heterogeneity

Job characteristics

Page 6: On the diversity of cluster workloads and its impact on ... › sites › default › files › conference › protected … · On the diversity of cluster workloads and its impact

Job Sizes

• Google jobs request 3 - 406x fewer CPU cores

• LANL request sizes more uniformly distributed

5

Two Sigma LANL

www.pdl.cmu.edu/ATLAS

1e-02 1e-01 1e+00 1e+01 1e+02 1e+03 1e+04 1e+05 1e+06

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Number of cores in job

Fra

ction o

f to

tal jo

bs

MustangOpenTrinityTwoSigmaGoogle

Solving head-of-line blocking by dedicating resources to small jobs becomes challenging [Delgado et al.]

Page 7: On the diversity of cluster workloads and its impact on ... › sites › default › files › conference › protected … · On the diversity of cluster workloads and its impact

1e-04 1e-03 1e-02 1e-01 1e+00 1e+01 1e+02 1e+03

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Job duration (hours)

Fra

ction o

f to

tal jo

bs

MustangOpenTrinityTwoSigmaGoogle

Job Duration

• Median Google job is 4 - 5x shorter

• But: LANL jobs end at 16- hours, Google jobs don’t

6

Two Sigma LANL

www.pdl.cmu.edu/ATLAS Mitigating straggler effect thru short task replication should be applied judiciously [Ananthanarayanan et al.]

Page 8: On the diversity of cluster workloads and its impact on ... › sites › default › files › conference › protected … · On the diversity of cluster workloads and its impact

Overview

www.pdl.cmu.edu/ATLAS 7

Characteristic Google Two Sigma Mustang OpenTrinity

Short jobs

Small jobs

Diurnal patterns

High job submission rate

Resource over-commitment

Sub-second interarrival periods

User request variability

High failure rates

Costly failures (wasted CPU hours)

Longer/larger jobs fail more often

Failure analysis

Resource utilization

Workload heterogeneity

Page 9: On the diversity of cluster workloads and its impact on ... › sites › default › files › conference › protected … · On the diversity of cluster workloads and its impact

Workload Heterogeneity

• Reversed diurnal patterns

• More/smaller Google jobs

between midnight and 4AM

• Job submission rate

• 10-1000x more scheduling

requests in Two Sigma, Google

8

1K jobs/hour ➞ 3.6 sec/job

70K tasks/hour ➞ 51 msec/task

www.pdl.cmu.edu/ATLAS

0

200

400

600

800

1000

1200

1400

Job s

ubm

issio

ns

TwoSigmaGoogle

0 6 12 18 23

0

5

10

15

20

25

30

35

40

Day hour (12am - 11pm)

Job s

ubm

issio

ns

MustangOpenTrinity

Task placement algorithms achieve subsecond latency today [Quincy, Firmament]

but we should aim for msec latencies

Page 10: On the diversity of cluster workloads and its impact on ... › sites › default › files › conference › protected … · On the diversity of cluster workloads and its impact

Overview

www.pdl.cmu.edu/ATLAS 9

Characteristic Google Two Sigma Mustang OpenTrinity

Short jobs

Small jobs

Diurnal patterns

High job submission rate

Resource over-commitment

Sub-second interarrival periods

User request variability

High failure rates

Costly failures (wasted CPU hours)

Longer/larger jobs fail more often

Failure analysis

Resource utilization

Page 11: On the diversity of cluster workloads and its impact on ... › sites › default › files › conference › protected … · On the diversity of cluster workloads and its impact

1e-01 1e+00 1e+01 1e+02 1e+03 1e+04 1e+05

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Job interarrival period (seconds)

Fra

ction o

f in

tera

rriv

als

MustangOpenTrinityTwoSigmaGoogle

Resource utilization: intensity

• Only Google overcommits resources (others at 65-90%)

• 43-64% of inter-arrivals <1sec long

• 20% of inter-arrivals >100sec at LANL → Maintenance

10 www.pdl.cmu.edu/ATLAS Systems should be tested with subsecond job interarrivals [Firmament, Quasar]

Page 12: On the diversity of cluster workloads and its impact on ... › sites › default › files › conference › protected … · On the diversity of cluster workloads and its impact

Overview

www.pdl.cmu.edu/ATLAS 11

Characteristic Google Two Sigma Mustang OpenTrinity

Short jobs

Small jobs

Diurnal patterns

High job submission rate

Resource over-commitment

Sub-second interarrival periods

User request variability

High failure rates

Costly failures (wasted CPU hours)

Longer/larger jobs fail more often

Failure analysis

Page 13: On the diversity of cluster workloads and its impact on ... › sites › default › files › conference › protected … · On the diversity of cluster workloads and its impact

Mustang

OpenTrinity

TwoSigma

Google0

10

20

30

40

50

60

70

80

90

100

Fra

ction o

f jo

bs (

%) Unsuccessful

TimeoutsUnsuccessful jobs

• Unsuccessful job rates at Google are significant

• 1.4-6.8x higher than other traces

• Highest efficiency: HPC clusters

• 34-80% fewer CPU hours wasted* at LANL

• Time wasted decreases with job runtime

12

Failed or

Aborted

Two Sigma LANL

www.pdl.cmu.edu/ATLAS

Mustang

OpenTrinity

TwoSigma

Google0

10

20

30

40

50

60

70

80

90

100

Fra

ction o

f C

PU

tim

e (

%)

Defining failure is crucial: software errors may be benign

Page 14: On the diversity of cluster workloads and its impact on ... › sites › default › files › conference › protected … · On the diversity of cluster workloads and its impact

www.pdl.cmu.edu/ATLAS 13

A case for

dataset pluralism

Page 15: On the diversity of cluster workloads and its impact on ... › sites › default › files › conference › protected … · On the diversity of cluster workloads and its impact

Estimating job runtimes

• Runtime estimates: improve cluster efficiency • Adjust to heterogeneous hardware → lower response times

• Job packing → increased utilization

• How do we come up with runtime estimates? • User-provided (Moab, Slurm @ LANL) → mostly inaccurate

• Leverage job repeats (Rayon in Hadoop) → effectiveness depends on workload

• JVuPredict/3Sigma: generate estimates automatically

• Step 1: Use past runtimes of jobs with similar feature(s)

• Step 2: Select predictor with highest accuracy

14

[EuroSys 2018]

www.pdl.cmu.edu/ATLAS

Page 16: On the diversity of cluster workloads and its impact on ... › sites › default › files › conference › protected … · On the diversity of cluster workloads and its impact

−∞ -80 -60 -40 -20 0 20 40 60 80 + ∞

0

5

10

15

20

25

30

35

40

Estimate error (%) ± 5%

Perc

ent of jo

bs (

%) Mustang

OpenTrinityTwoSigmaGoogle

JVuPredict: Accuracy across traces

• Reliance on: user ID, number of cores, job name (if present)

• Logical job names matter!

• Need busy (100K+ jobs) or long (3+ months) traces for training

15

Under-

estimations:

bad!

Over-

estimations:

eh…

www.pdl.cmu.edu/ATLAS

Page 17: On the diversity of cluster workloads and its impact on ... › sites › default › files › conference › protected … · On the diversity of cluster workloads and its impact

Summary

16

Characteristic Google Two Sigma Mustang OpenTrinity

Short jobs

Small jobs

Diurnal patterns

High job submission rate

Resource over-commitment

Sub-second interarrival periods

User request variability

High failure rates

Costly failures (wasted CPU hours)

Longer/larger jobs fail more often

Private more similar to HPC, except: Failure rates, Job submission rate