Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Post on 16-Aug-2020

14 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Hawk: Hybrid Datacenter Scheduling

Pamela Delgado, Florin Dinu,

Anne-Marie Kermarrec, Willy Zwaenepoel

1USENIX ATC 2015

Job 1

cluster

task

2

scheduler

task

Job N

task task

… …

Introduction: datacenter scheduling

cluster

3

centralized

scheduler

Job 1

task task

Job N

task task

… …

Introduction: centralized scheduling

Introduction: centralized scheduling

cluster

4

centralized

scheduler… Job 1Job 2Job N

… …

Introduction: centralized scheduling

cluster

5

centralized

scheduler… Job 1Job 2Job N

… …

Good: placement

Not so good: scheduling latency

Introduction: distributed scheduling

6

cluster

distributedscheduler 1

distributedscheduler 2

distributedscheduler N

Job 1

Job 2

Job N

… …

Introduction: distributed scheduling

7

cluster

distributedscheduler 1

distributedscheduler 2

distributedscheduler N

Good: scheduling latency

Not so good: placement

Job 1

Job 2

Job N

… …

Outline

8

1) Introduction

2) HAWK hybrid scheduling

• Rationale

• Design

3) Evaluation

• Simulation

• Real cluster implementation

4) Conclusion

Hybrid scheduling

9

cluster

Job 1

Job N

Job 2

distributedscheduler N

distributedscheduler 1

centralizedscheduler

Job M

… …

Hawk: hybrid scheduling

10

Long jobs centralized

Short jobs distributed

Hawk: hybrid scheduling

Long job 1

Short job 1

Long job M

Short job N

distributedscheduler 1

distributedscheduler N

11

Long/short:estimatedexecution time vs cut-off

centralizedscheduler

… … …

Short job 2 distributedscheduler 2

Rationale for Hawk

Long job 1

Short job 1

Long job M

Short job N

12

Typical production workloads

many

few

little resources

most resources

…Short job 2

Rationale for Hawk (continued)

13Source: Design Insights for MapReduce from Diverse Production Workloads, Chen et al 2012

0

20

40

60

80

100

0

20

40

60

80

100

Percentage of long jobs Percentage of task-seconds for long jobs

Rationale for Hawk (continued)

14Source: Design Insights for MapReduce from Diverse Production Workloads, Chen et al 2012

0

20

40

60

80

100

0

20

40

60

80

100

Percentage of long jobs Percentage of task-seconds for long jobs

Long jobs: minority but

take up most of the resources

centralized

Hawk: hybrid scheduling

distributed 1

distributed N

15

Few jobs reasonableschedulinglatency

Few resources can tradenot-so-good

placement

Long job 1

Short job 1

Short job N

Bulk ofresources good placement

Latency sensitive Fast scheduling

… …

centralized

Hawk: hybrid scheduling

distributed 1

distributed N

16

Few jobs reasonableschedulinglatency

Few resources can tradenot-so-good

placement

Long job 1

Short job 1

Short job N

Bulk ofresources good placement

Latency sensitive Fast scheduling

… …

BEST OF BOTH WORLDS

Good: scheduling latency for short jobs

Good: placement for long jobs

Hawk: distributed scheduling

17

• Sparrow

• Work-stealing

Hawk: distributed scheduling

18

• Sparrow

• Work-stealing

Sparrow

distributedscheduler

task

random

reservation

(power of two)

19

Hawk: distributed scheduling

20

• Sparrow

• Work-stealing

Sparrow and high load

distributedscheduler

task

Random

placement:

Low likelihood on

finding a free node21

Sparrow and high load

distributedscheduler

task

Random

placement:

Low likelihood on

finding a free node22

High load + job heterogeneity

head-of-line blocking

Hawk work-stealing

23

Free node!!

Hawk work-stealing

24

1. Free node:

contact random

node for probes!

2. Random node:

send short tasks

reservation in queue

Hawk work-stealing

25

1. Free node:

contact random

node for probes!

2. Random node:

send short tasks

reservation in queue

High load high probability

of contacting node with backlog

Hawk cluster partitioning

distributedscheduler

26

Reserved nodes:

small cluster

partition

centralizedscheduler

No coordination,

challenge: no free

nodes for mice!

Hawk cluster partitioning

distributedscheduler

27

Reserved nodes:

small cluster

partition

centralizedscheduler

No coordination,

challenge: no free

nodes for mice!

Short jobs schedule anywhere.

Long jobs only in non-reserved nodes.

Hawk design summary

Hybrid scheduler:

long centralized, short distributed

Work-stealing

Cluster partitioning

28

29

Evaluation: 1. Simulation

• Sparrow simulator

• Google trace

• Vary number of nodes to vary cluster utilization

• Measure: Job running time

• Report 50th and 90th percentiles for short and long jobs

• Normalized to Sparrow

Simulated results: short jobs

0

0.2

0.4

0.6

0.8

1

1.2

10 20 30 40 50

Haw

k/Sp

arro

w

Number of nodes in the cluster (thousands)

50th 90th

lower better

30

Simulated results: short jobs

0

0.2

0.4

0.6

0.8

1

1.2

10 20 30 40 50

Haw

k/Sp

arro

w

Number of nodes in the cluster (thousands)

50th 90th

lower better

31

Better across the board

Simulated results: long jobs

0

0.2

0.4

0.6

0.8

1

1.2

10 20 30 40 50

Haw

k/Sp

arro

w

Number of nodes in the cluster (thousands)

50th 90th

32

lower better

Simulated results: long jobs

0

0.2

0.4

0.6

0.8

1

1.2

10 20 30 40 50

Haw

k/Sp

arro

w

Number of nodes in the cluster (thousands)

50th 90th

33

Better except under high load

lower better

Simulated results: long jobs

0

0.2

0.4

0.6

0.8

1

1.2

10 20 30 40 50

Haw

k/Sp

arro

w

Number of nodes in the cluster (thousands)

50th 90th

34

Very high utilization: partitioning

lower better

Decomposing Hawk

1. Hawk minus centralized

2. Hawk minus stealing

3. Hawk minus partitioning

(normalized to Hawk)

35

0

0.5

1

1.5

2

Hawk no centralized

50th short jobs 90th short jobs

50th long jobs 90th long jobs

Decomposing Hawk: no centralized

36

1. Hawk minus centralized

2. Hawk minus stealing

3. Hawk minus partitioning

(normalized to Hawk)

Decomposing Hawk: no stealing

37

0

0.5

1

1.5

2

Hawk no stealing

50th short jobs 90th short jobs

50th long jobs 90th long jobs

19.6

1. Hawk minus centralized

2. Hawk minus stealing

3. Hawk minus partitioning

(normalized to Hawk)

Decomposing Hawk: no partitioning

38

0

0.5

1

1.5

2

Hawk no partition

50th short jobs 90th short jobs

50th long jobs 90th long jobs

11.9

1. Hawk minus centralized

2. Hawk minus stealing

3. Hawk minus partitioning

(normalized to Hawk)

0

0.5

1

1.5

2

Hawk no centralized Hawk no stealing Hawk no partition

50th short jobs 90th short jobs 50th long jobs 90th long jobs

Decomposing Hawk summary

39

11.919.6

0

0.5

1

1.5

2

Hawk no centralized Hawk no stealing Hawk no partition

50th short jobs 90th short jobs 50th long jobs 90th long jobs

Decomposing Hawk summary

Absence of any component

reduces Hawk’s performance!

40

11.919.6

Sensitivity analysis

1. Incorrect estimates of runtime

2. Cut off long/short

3. Details of stealing

41

Sensitivity analysis

1. Incorrect estimates of runtime

2. Cut off long/short

3. Details of stealing

Bottom line: benefits of Hawk remain despite variation

See paper for details

42

Evaluation: 2. Implementation

43

Hawk daemon

Hawkscheduler

Hawk daemon

Experiment

• 100-node cluster

• Subset of Google trace

• Vary inter-arrival time to vary cluster utilization

• Measure: Job running time

• Report 50th and 90th percentile for short and long jobs

• Normalized to Sparrow

44

Short jobs

45

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1 1.2 1.4 1.6 1.8 2 2.25

Haw

k/Sp

arro

w

Inter-arrival time

real 90th simulated 90th

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1 1.2 1.4 1.6 1.8 2 2.25

Haw

k/Sp

arro

w

Inter-arrival time

real 50th simulated 50th

lower better

Long jobs

46

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1 1.2 1.4 1.6 1.8 2 2.25

Haw

k/Sp

arro

w

Inter-arrival time

real 90th simulated 90th

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1 1.2 1.4 1.6 1.8 2 2.25

Haw

k/Sp

arro

w

Inter-arrival time

real 50th simulated 50th

lower better

Implementation

47

1. Hawk works well in real cluster

2. Good correspondence

implementation/simulation

Related work

48

Centralized: Hadoop Fair Scheduler, Quincy

Eurosys’10, SOSP‘09

Two level: Yarn, Mesos

SoCC’12, NSDI’11

Distributed schedulers: Omega, Sparrow

Eurosys’12,SOSP’13

Hybrid schedulers: Mercury

Conclusion

• Hawk: hybrid scheduler

long: centralized, short: distributed

work-stealing

cluster partitioning

• Hawk provides good results for short and long jobs

• Even under high cluster utilization

49

top related