Top Banner
Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions Symbiotic Job Scheduling on the IBM POWER8 J. Feliu 1 S. Eyerman 2 J. Sahuquillo 1 S. Petit 1 1 Department of Computing Engineering (DISCA) Universitat Polit` ecnica de Val` encia [email protected], {jsahuqui,spetit}@disca.upv.es 2 Intel Belgium [email protected] March 16th, 2016 2 This work was done while Stijn Eyerman was at Ghent University J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 1 / 24
72

Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Aug 01, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Symbiotic Job Scheduling on the IBM POWER8

J. Feliu1 S. Eyerman2 J. Sahuquillo1 S. Petit1

1Department of Computing Engineering (DISCA)Universitat Politecnica de Valencia

[email protected], {jsahuqui,spetit}@disca.upv.es

2Intel [email protected]

March 16th, 2016

2This work was done while Stijn Eyerman was at Ghent UniversityJ. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 1 / 24

Page 2: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Outline

1 Introduction

2 Predicting Job Symbiosis

3 SMT Interference-Aware Scheduler

4 Experimental Evaluation

5 Conclusions

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 2 / 24

Page 3: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Introduction

Scheduling is important for manycore / manythread systemsCombinatorial amount of ways to schedule applications withdifferent performance

Scheduling for CMPs of SMT cores is challengingDifferent levels of resource sharingSMT performance very sensitive to co-runners

Selecting the optimal schedule is an NP-hard problem

Predicting the performance of a schedule is not trivial becauseof the high amount of resource sharing in SMTs

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 3 / 24

Page 4: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Introduction

Previous work on symbiotic schedulingUses sampling to explore the space of possible schedules(Snavely et al., ASPLOS’00)Relies on novel hardware (Eyerman et al, ASPLOS’10)Performs an offline analysis with µbenchmarks to predict theinterference between applications (Zhang et al., MICRO’14)

Our symbiotic job schedulerOnline model-based schedulingWithout sampling schedulesOn a recent commercial processor

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 4 / 24

Page 5: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Introduction

Previous work on symbiotic schedulingUses sampling to explore the space of possible schedules(Snavely et al., ASPLOS’00)Relies on novel hardware (Eyerman et al, ASPLOS’10)Performs an offline analysis with µbenchmarks to predict theinterference between applications (Zhang et al., MICRO’14)

Our symbiotic job schedulerOnline model-based schedulingWithout sampling schedulesOn a recent commercial processor

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 4 / 24

Page 6: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

IntroductionMain contributions

Interference modelPredicts the interference among threads on a SMT coreBased on CPI stacksConsiders contention in all the shared resources

Online schedulerQuickly explore the schedule space to select the optimal oneQuickly adapt to phase behavior

Implemented and evaluated on the IBM POWER8Average system throughput increase by 10.3% over a randomscheduler and 4.7% over Linux

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 5 / 24

Page 7: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

IntroductionMain contributions

Interference modelPredicts the interference among threads on a SMT coreBased on CPI stacksConsiders contention in all the shared resources

Online schedulerQuickly explore the schedule space to select the optimal oneQuickly adapt to phase behavior

Implemented and evaluated on the IBM POWER8Average system throughput increase by 10.3% over a randomscheduler and 4.7% over Linux

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 5 / 24

Page 8: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

IntroductionMain contributions

Interference modelPredicts the interference among threads on a SMT coreBased on CPI stacksConsiders contention in all the shared resources

Online schedulerQuickly explore the schedule space to select the optimal oneQuickly adapt to phase behavior

Implemented and evaluated on the IBM POWER8Average system throughput increase by 10.3% over a randomscheduler and 4.7% over Linux

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 5 / 24

Page 9: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Outline

1 Introduction

2 Predicting Job SymbiosisInterference modelModel construction and slowdown estimationObtaining ST CPI stacks in SMT mode

3 SMT Interference-Aware Scheduler

4 Experimental Evaluation

5 Conclusions

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 6 / 24

Page 10: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Predicting Job Symbiosis

Symbiotic scheduler: based on a model that estimates jobsymbiosis

Predicts the slowdown of the application on a scheduleIt is fast, allowing us to select the optimal schedule

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 7 / 24

Page 11: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Predicting Job Symbiosis

Symbiotic scheduler: based on a model that estimates jobsymbiosis

Predicts the slowdown of the application on a scheduleIt is fast, allowing us to select the optimal schedule

SMT core 1

A B

C D

A D B C

Applications

Schedules

SMT core 0

SMT core 1SMT core 0

A C B D

SMT core 1SMT core 0

A B C D

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 7 / 24

Page 12: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Predicting Job Symbiosis

Symbiotic scheduler: based on a model that estimates jobsymbiosis

Predicts the slowdown of the application on a scheduleIt is fast, allowing us to select the optimal schedule

2.7

2.2

SMT core 1

A B

C D

A D B C

Applications

Schedules

SMT core 0

SMT core 1SMT core 0

A C B D

SMT core 1SMT core 0

A B C D M

O

D

E

L

1.9

Throughput

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 7 / 24

Page 13: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Interference model

The proposed model leverages CPI stacks to predict jobsymbiosis

Model: estimates the slowdownInterprets the normalized CPI components as probabilitiesCalculates the probability of interference

0

0.5

1

1.5

2

2.5

App 1 App 2

Base

Resource

Miss

Measured single-thread

CPI stacks

CPI

ed

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 8 / 24

CPI StacksDivide the execution cycles into variouscomponents:

Base: cycles where instructions arecompletedResource: no instruction completeddue to resource stallMiss: no instruction completed dueto miss event

Page 14: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Interference model

The proposed model leverages CPI stacks to predict jobsymbiosis

Model: estimates the slowdownInterprets the normalized CPI components as probabilitiesCalculates the probability of interference

0

0.5

1

1.5

2

2.5

App 1 App 2

Base

Resource

Miss

Measured single-thread

CPI stacks

CPI

ed

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 8 / 24

CPI StacksDivide the execution cycles into variouscomponents:

Base: cycles where instructions arecompletedResource: no instruction completeddue to resource stallMiss: no instruction completed dueto miss event

Page 15: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Interference model

The proposed model leverages CPI stacks to predict jobsymbiosisModel: estimates the slowdown

Interprets the normalized CPI components as probabilitiesCalculates the probability of interference

0

0.5

1

1.5

2

2.5

App 1 App 2

Base

Resource

Miss

0

0.2

0.4

0.6

0.8

1

B R M

0

0.2

0.4

0.6

0.8

1

1.2

1.4

App 1 App 2

B' R' M'

Measured single-thread

CPI stacks

Predicted normalized

SMT CPI stacks

model

CPI

Predicted slowdown

edApp 1 App 2

Normalized single-

threaded CPI stacks

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 8 / 24

CPI StacksDivide the execution cycles into variouscomponents:

Base: cycles where instructions arecompletedResource: no instruction completeddue to resource stallMiss: no instruction completed dueto miss event

Page 16: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Model equation

Each component is modeled with the equation:

C ′j = αC + βCCj + γC

∑k 6=j

Ck + δCCj∑k 6=j

Ck (1)

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 9 / 24

Page 17: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Model equation

Each component is modeled with the equation:

C ′j = αC + βCCj + γC

∑k 6=j

Ck + δCCj∑k 6=j

Ck (1)

ComponentsCj represents thread j own component in ST mode

Cj represents the ST component of the other threads in thescheduleCj ’

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 9 / 24

Page 18: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Model equation

Each component is modeled with the equation:

C ′j = αC + βCCj + γC

∑k 6=j

Ck + δCCj∑k 6=j

Ck (1)

ComponentsCj represents thread j own component in ST modeCk represents the ST component of the other threads in theschedule

Cj ’

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 9 / 24

Page 19: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Model equation

Each component is modeled with the equation:

C ′j = αC + βCCj + γC

∑k 6=j

Ck + δCCj∑k 6=j

Ck (1)

ComponentsCj represents thread j own component in ST modeCk represents the ST component of the other threads in thescheduleCj ’ identifies the SMT component of thread j

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 9 / 24

Page 20: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Model equation

Each component is modeled with the equation:

C ′j = αC + βCCj + γC

∑k 6=j

Ck + δCCj∑k 6=j

Ck (1)

ParametersαC reflects a constant increase in SMT over ST

βC reflects the fraction or relative increase of the original STcomponent appears in SMT executionγC models the impact of the sum of the ST components ofthe other co-scheduled threadsδC models extra interactions that may occur between threadsThe meaningful parameters are determined using regression

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 9 / 24

Page 21: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Model equation

Each component is modeled with the equation:

C ′j = αC + βCCj + γC

∑k 6=j

Ck + δCCj∑k 6=j

Ck (1)

ParametersαC reflects a constant increase in SMT over STβC reflects the fraction or relative increase of the original STcomponent appears in SMT execution

γC models the impact of the sum of the ST components ofthe other co-scheduled threadsδC models extra interactions that may occur between threadsThe meaningful parameters are determined using regression

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 9 / 24

Page 22: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Model equation

Each component is modeled with the equation:

C ′j = αC + βCCj + γC

∑k 6=j

Ck + δCCj∑k 6=j

Ck (1)

ParametersαC reflects a constant increase in SMT over STβC reflects the fraction or relative increase of the original STcomponent appears in SMT executionγC models the impact of the sum of the ST components ofthe other co-scheduled threads

δC models extra interactions that may occur between threadsThe meaningful parameters are determined using regression

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 9 / 24

Page 23: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Model equation

Each component is modeled with the equation:

C ′j = αC + βCCj + γC

∑k 6=j

Ck + δCCj∑k 6=j

Ck (1)

ParametersαC reflects a constant increase in SMT over STβC reflects the fraction or relative increase of the original STcomponent appears in SMT executionγC models the impact of the sum of the ST components ofthe other co-scheduled threadsδC models extra interactions that may occur between threads

The meaningful parameters are determined using regression

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 9 / 24

Page 24: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Model equation

Each component is modeled with the equation:

C ′j = αC + βCCj + γC

∑k 6=j

Ck + δCCj∑k 6=j

Ck (1)

ParametersαC reflects a constant increase in SMT over STβC reflects the fraction or relative increase of the original STcomponent appears in SMT executionγC models the impact of the sum of the ST components ofthe other co-scheduled threadsδC models extra interactions that may occur between threadsThe meaningful parameters are determined using regression

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 9 / 24

Page 25: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Model construction and slowdown estimationModel parameters determined by linear regression

One-time offline trainingBased on experimental dataNot tied to applications, no need to retrain, no overfit

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 10 / 24

Page 26: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Model construction and slowdown estimationModel parameters determined by linear regression

One-time offline trainingBased on experimental dataNot tied to applications, no need to retrain, no overfit

Measured ST CPI stacks(Alone)

0

0.5

1

1.5

2

2.5

App 1 App 2

CP

I

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 10 / 24

Page 27: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Model construction and slowdown estimationModel parameters determined by linear regression

One-time offline trainingBased on experimental dataNot tied to applications, no need to retrain, no overfit

Measured ST CPI stacks(Alone)

Measured SMT CPI stacks(Together)

0

0.5

1

1.5

2

2.5

App 1 App 2

CP

I

0

0.5

1

1.5

2

2.5

App 1 App 2

CP

I

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 10 / 24

Page 28: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Model construction and slowdown estimationModel parameters determined by linear regression

One-time offline trainingBased on experimental dataNot tied to applications, no need to retrain, no overfit

Measured ST CPI stacks(Alone)

Measured SMT CPI stacks(Together)

ST CPI stacks normalized to ST CPI

SMT CPI stacks normalized to ST CPI

0

0.5

1

1.5

2

2.5

App 1 App 2

CP

I

0

0.5

1

1.5

2

2.5

App 1 App 2

CP

I

0

0.2

0.4

0.6

0.8

1

1.2

1.4

App 1 App 2

Slo

wd

ow

n

0

0.2

0.4

0.6

0.8

1

1.2

1.4

App 1 App 2

No

rmal

ized

CP

I

ST CPI

Normalization over ST CPI

Normalization over ST CPI

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 10 / 24

Page 29: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Model construction and slowdown estimationModel parameters determined by linear regression

One-time offline trainingBased on experimental dataNot tied to applications, no need to retrain, no overfit

Measured ST CPI stacks(Alone)

Measured SMT CPI stacks(Together)

ST CPI stacks normalized to ST CPI

SMT CPI stacks normalized to ST CPI

0

0.5

1

1.5

2

2.5

App 1 App 2

CP

I

0

0.5

1

1.5

2

2.5

App 1 App 2

CP

I

0

0.2

0.4

0.6

0.8

1

1.2

1.4

App 1 App 2

Slo

wd

ow

n

0

0.2

0.4

0.6

0.8

1

1.2

1.4

App 1 App 2

No

rmal

ized

CP

I

ST CPI

Linear regression(α, β, γ, δ)

Normalization over ST CPI

Normalization over ST CPI

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 10 / 24

Page 30: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Obtaining ST CPI stacks in SMT mode

Obtaining the ST CPI stacks is not a trivial issue

Offline profiling of CPI stacks (Impractical)Sampling CPI stacks at runtime (Overhead)Specific hardware to collect ST CPI stacks online (Unavailable)

Measure the SMT CPI stacks and invert the model toobtain ST CPI stacks

Not trivial: ST CPI not available in SMT executionSolved with an approximate approach

00.20.40.60.81

1.21.4

App1 App2

Normalize

dCPI

00.20.40.60.81

1.21.4

App1 App2

Slow

down

STCPIstacksnormalizedtoSTCPI

SMTCPIstacksnormalizedtoSTCPI

Model

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 11 / 24

Page 31: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Obtaining ST CPI stacks in SMT mode

Obtaining the ST CPI stacks is not a trivial issueOffline profiling of CPI stacks (Impractical)Sampling CPI stacks at runtime (Overhead)Specific hardware to collect ST CPI stacks online (Unavailable)

Measure the SMT CPI stacks and invert the model toobtain ST CPI stacks

Not trivial: ST CPI not available in SMT executionSolved with an approximate approach

00.20.40.60.81

1.21.4

App1 App2

Normalize

dCPI

00.20.40.60.81

1.21.4

App1 App2

Slow

down

STCPIstacksnormalizedtoSTCPI

SMTCPIstacksnormalizedtoSTCPI

Model

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 11 / 24

Page 32: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Obtaining ST CPI stacks in SMT mode

Obtaining the ST CPI stacks is not a trivial issueOffline profiling of CPI stacks (Impractical)Sampling CPI stacks at runtime (Overhead)Specific hardware to collect ST CPI stacks online (Unavailable)

Measure the SMT CPI stacks and invert the model toobtain ST CPI stacks

Not trivial: ST CPI not available in SMT executionSolved with an approximate approach

00.20.40.60.81

1.21.4

App1 App2

Normalize

dCPI

00.20.40.60.81

1.21.4

App1 App2

Slow

down

STCPIstacksnormalizedtoSTCPI

SMTCPIstacksnormalizedtoSTCPI

Invertedmodel

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 11 / 24

Page 33: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Obtaining ST CPI stacks in SMT mode

Obtaining the ST CPI stacks is not a trivial issueOffline profiling of CPI stacks (Impractical)Sampling CPI stacks at runtime (Overhead)Specific hardware to collect ST CPI stacks online (Unavailable)

Measure the SMT CPI stacks and invert the model toobtain ST CPI stacks

Not trivial: ST CPI not available in SMT executionSolved with an approximate approach

00.20.40.60.81

1.21.4

App1 App2

Normalize

dCPI

00.20.40.60.81

1.21.4

App1 App2

Slow

down

STCPIstacksnormalizedtoSTCPI

SMTCPIstacksnormalizedtoSTCPI

Model

Invertedmodel

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 11 / 24

Page 34: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Outline

1 Introduction

2 Predicting Job Symbiosis

3 SMT Interference-Aware SchedulerReduction of the cycle stack componentsCorrection factorSelection of the optimal schedule

4 Experimental Evaluation

5 Conclusions

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 12 / 24

Page 35: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Reduction of the cycle stack components

45 events form the full CPI stack of the the IBM POWER8

6 thread-level counters are implemented (4 programmable)Structural conflicts on some events that cannot be measuredtogether19 time slices required to build the full CPI stack

Unacceptable for schedulingObtaining an updated CPI stack is not possible

Fortunately, the CPI stack model is build hierarchicallyTop level with 5 componentsThe model accuracy is reduced, but it has lower complexityand use updated CPI stacks

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 13 / 24

Page 36: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Correction factor

The model is relatively accurate in general, but moreinaccurate for particular schedules

Interference in the inter-core shared resources not directlymodeled (e.g. LLC interference)

Correction factor

Cf = Measured slowdownModel slowdown

Updated using an exponential moving average, to smooth outsudden changes

Requires knowledge of the isolated performance

Very sparsely run the applications in ST mode, incurring 0.2%overhead

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 14 / 24

Page 37: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Correction factor

The model is relatively accurate in general, but moreinaccurate for particular schedules

Interference in the inter-core shared resources not directlymodeled (e.g. LLC interference)

Correction factorCf = Measured slowdown

Model slowdown

Updated using an exponential moving average, to smooth outsudden changes

Requires knowledge of the isolated performance

Very sparsely run the applications in ST mode, incurring 0.2%overhead

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 14 / 24

Page 38: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Correction factor

Schedule

SMTcore 1SMTcore 0

A B C D

MODEL

A B C DA - 1.0 1.0 1.0

B 1.0 - 1.0 1.0

C 1.0 1.0 - 1.0

D 1.0 1.0 1.0 -Application

Co-runner

"# = %&'()*&+(,-.+-./%-+&, (,-.+-./

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 14 / 24

Page 39: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Correction factor

Schedule

SMTcore 1SMTcore 0

A B C D

MODEL

A B C DA - 1.0 1.0 1.0

B 1.0 - 1.0 1.0

C 1.0 1.0 - 1.0

D 1.0 1.0 1.0 -

Correctionfactors

Estimatedslowdown

Measuredslowdown

1.2 1.4 1.3 1.5Modelslowdown

Application

Co-runner

"# = %&'()*&+(,-.+-./%-+&, (,-.+-./

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 14 / 24

Page 40: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Correction factor

Schedule

SMTcore 1SMTcore 0

A B C D

MODEL

1

A B C DA - 1.0 1.0 1.0

B 1.0 - 1.0 1.0

C 1.0 1.0 - 1.0

D 1.0 1.0 1.0 -

1.2 1.4 1.3 1.5

Correctionfactors

Estimatedslowdown

1 1 1

1.2 1.4 1.3 1.5Modelslowdown

Application

Co-runner

"# = %&'()*&+(,-.+-./%-+&, (,-.+-./

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 14 / 24

Page 41: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Correction factor

Schedule

SMTcore 1SMTcore 0

A B C D

MODEL

1

A B C DA - 1.0 1.0 1.0

B 1.0 - 1.0 1.0

C 1.0 1.0 - 1.0

D 1.0 1.0 1.0 -

1.2 1.4 1.3 1.5

1.2 1.5 1.2 2.0

Correctionfactors

Estimatedslowdown

Measuredslowdown

1 1 1

1.2 1.4 1.3 1.5Modelslowdown

Application

Co-runner

"# = %&'()*&+(,-.+-./%-+&, (,-.+-./

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 14 / 24

Page 42: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Correction factor

"# = %&'()*&+(,-.+-./%-+&, (,-.+-./

Schedule

SMTcore 1SMTcore 0

A B C D

MODEL

1

A B C DA - 1.0 1.0 1.0

B 1.1 - 1.0 1.0

C 1.0 1.0 - 0.9

D 1.0 1.0 1.0 -

1.2 1.4 1.3 1.5

Correctionfactors

Estimatedslowdown

Measuredslowdown

1 1 1

1.2 1.4 1.3 1.5Modelslowdown

Application

Co-runner

1.2 1.5 1.2 2.0

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 14 / 24

Page 43: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Correction factor

Schedule

SMTcore 1SMTcore 0

A B C D

MODEL

1

A B C DA - 1.0 1.0 1.0

B 1.1 - 1.0 1.0

C 1.0 1.0 - 0.9

D 1.0 1.0 1.0 -

1.2 1.5 1.6 1.5

1.2 1.5 1.6 1.5

Correctionfactors

Estimatedslowdown

Measuredslowdown

1

1.2 1.4 1.3 1.5Modelslowdown

Application

Co-runner

0.9 0.8

"# = %&'()*&+(,-.+-./%-+&, (,-.+-./

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 14 / 24

Page 44: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Correction factor

Schedule

SMTcore 1SMTcore 0

A B C D

MODEL

1

A B C DA - 1.0 1.0 1.0

B 1.1 - 1.0 1.0

C 1.0 1.0 - 0.9

D 1.0 1.0 1.0 -

1.2 1.5 1.2 1.5

1.2 1.5 1.2 1.5

Correctionfactors

Estimatedslowdown

Measuredslowdown

1

1.2 1.4 1.3 1.5Modelslowdown

Application

Co-runner

1.1 0.9

"# = %&'()*&+(,-.+-./%-+&, (,-.+-./

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 14 / 24

Page 45: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Correction factor

The model is relatively accurate in general, but moreinaccurate for particular schedules

Interference in the inter-core shared resources not directlymodeled (e.g. LLC interference)

Correction factorCf = Measured slowdown

Model slowdownUpdated using an exponential moving average, to smooth outsudden changes

Requires knowledge of the isolated performance

Very sparsely run the applications in ST mode, incurring 0.2%overhead

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 14 / 24

Page 46: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Correction factor

The model is relatively accurate in general, but moreinaccurate for particular schedules

Interference in the inter-core shared resources not directlymodeled (e.g. LLC interference)

Correction factorCf = Measured slowdown

Model slowdownUpdated using an exponential moving average, to smooth outsudden changes

Requires knowledge of the isolated performanceVery sparsely run the applications in ST mode, incurring 0.2%overhead

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 14 / 24

Page 47: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Selection of the optimal schedule

Too large number of different schedulesn!

c!( nc !)c n applications onto c cores

More than 2M schedules for 16 applications in 8 cores!

Modeled as a minimum-weight perfect matchingproblem, that can be solved in polynomial time using theblossom algorithm

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 15 / 24

Page 48: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Selection of the optimal schedule

Too large number of different schedulesn!

c!( nc !)c n applications onto c cores

More than 2M schedules for 16 applications in 8 cores!

Modeled as a minimum-weight perfect matchingproblem, that can be solved in polynomial time using theblossom algorithm

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 15 / 24

Page 49: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Scheduling steps

CollecttheSMTCPIstacks

Updatethecorrectionfactors

Applytheinvertedmodel

Applytheforwardmodel

Findthebest

schedule

CollecttheSMTCPIstacks

Runtheselectedschedule

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 16 / 24

Page 50: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Scheduling steps

CollecttheSMTCPIstacks

Updatethecorrectionfactors

Applytheinvertedmodel

Applytheforwardmodel

Findthebest

schedule

Runtheselectedschedule

00.20.40.60.81

1.21.4

Slow

down

MeasuredSMTCPIstacks

NormalizedSMTCPI1.3 1.2 1.1 1.4

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 16 / 24

Page 51: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Scheduling steps

Updatecorrectionfactors

CollecttheSMTCPIstacks

Updatethecorrectionfactors

Applytheinvertedmodel

Applytheforwardmodel

Findthebest

schedule

Runtheselectedschedule

00.20.40.60.81

1.21.4

Slow

down

MeasuredSMTCPIstacks

NormalizedSMTCPI1.3 1.2 1.1 1.4

A B C D

A - 1.0 1.0 1.0

B 0.9 - 1.0 1.0

C 1.0 1.0 - 0.8

D 1.0 1.0 1.0 -

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 16 / 24

Page 52: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Scheduling steps

Updatecorrectionfactors

CollecttheSMTCPIstacks

Updatethecorrectionfactors

Applytheinvertedmodel

Applytheforwardmodel

Findthebest

schedule

Runtheselectedschedule

00.20.40.60.81

1.21.4

Normalize

dCPI

00.20.40.60.81

1.21.4

Slow

down

MeasuredSMTCPIstacks

EstimatedSTCPIstacks

Invertedmodel

NormalizedSMTCPI1.3 1.2 1.1 1.4

A B C D

A - 1.0 1.0 1.0

B 0.9 - 1.0 1.0

C 1.0 1.0 - 0.8

D 1.0 1.0 1.0 -

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 16 / 24

Page 53: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Scheduling steps

Applycorrectionfactors

Updatecorrectionfactors

00.20.40.60.81

1.21.4

Slow

down

00.20.40.60.81

1.21.4

Slow

down

CollecttheSMTCPIstacks

Updatethecorrectionfactors

Applytheinvertedmodel

Applytheforwardmodel

Findthebest

schedule

Runtheselectedschedule

00.20.40.60.81

1.21.4

Normalize

dCPI

00.20.40.60.81

1.21.4

Slow

down

00.20.40.60.81

1.21.4

Slow

down

MeasuredSMTCPIstacks

EstimatedSTCPIstacks

PredictedSMTCPIstacks

Invertedmodel

Forwardmodel

NormalizedSMTCPI1.3 1.2 1.1 1.4

A B C D

A C B D

A D B C

2.7

2.5

1.9

A B C D

A - 1.0 1.0 1.0

B 0.9 - 1.0 1.0

C 1.0 1.0 - 0.8

D 1.0 1.0 1.0 -

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 16 / 24

Page 54: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Scheduling steps

Applycorrectionfactors

Updatecorrectionfactors

00.20.40.60.81

1.21.4

Slow

down

00.20.40.60.81

1.21.4

Slow

down

CollecttheSMTCPIstacks

Updatethecorrectionfactors

Applytheinvertedmodel

Applytheforwardmodel

Findthebest

schedule

Runtheselectedschedule

00.20.40.60.81

1.21.4

Normalize

dCPI

00.20.40.60.81

1.21.4

Slow

down

00.20.40.60.81

1.21.4

Slow

down

MeasuredSMTCPIstacks

EstimatedSTCPIstacks

PredictedSMTCPIstacks

Invertedmodel

Forwardmodel

NormalizedSMTCPI1.3 1.2 1.1 1.4

A B C D

A C B D

A D B C

2.7

2.5

1.9

A B C D

A - 1.0 1.0 1.0

B 0.9 - 1.0 1.0

C 1.0 1.0 - 0.8

D 1.0 1.0 1.0 -

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 16 / 24

Page 55: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Scheduling steps

Applycorrectionfactors

Updatecorrectionfactors

00.20.40.60.81

1.21.4

Slow

down

00.20.40.60.81

1.21.4

Slow

down

CollecttheSMTCPIstacks

Updatethecorrectionfactors

Applytheinvertedmodel

Applytheforwardmodel

Findthebest

schedule

Runtheselectedschedule

00.20.40.60.81

1.21.4

Normalize

dCPI

00.20.40.60.81

1.21.4

Slow

down

00.20.40.60.81

1.21.4

Slow

down

MeasuredSMTCPIstacks

EstimatedSTCPIstacks

PredictedSMTCPIstacks

Invertedmodel

Forwardmodel

NormalizedSMTCPI1.3 1.2 1.1 1.4

A B C D

A C B D

A D B C

2.7

2.5

1.9

A B C D

A - 1.0 1.0 1.0

B 0.9 - 1.0 1.0

C 1.0 1.0 - 0.8

D 1.0 1.0 1.0 -

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 16 / 24

Page 56: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Outline

1 Introduction

2 Predicting Job Symbiosis

3 SMT Interference-Aware Scheduler

4 Experimental EvaluationExperimental setupScheduler performance

5 Conclusions

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 17 / 24

Page 57: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Experimental setup

10-core IBM POWER8SPEC CPU2006 benchmarks (reference input set)105 multiprogram workloads

From 8-application combinations on 4 cores to 20-applicationcombinations on 10 cores

Metrics:System throughput (STP), by means of the weighted speedupmetricSystem fairness, Unfairness = Max Slowdowni

Min Slowdownj∀{i , j} ∈ {1,N}

Four schedulers are compared:

RandomLinux, default Completely Fair Scheduler (CFS)L1-bandwidth aware scheduler, which balances the L1bandwidth utilization among cores. Feliu et al., PACT’13Symbiotic scheduler

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 18 / 24

Page 58: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Experimental setup

10-core IBM POWER8SPEC CPU2006 benchmarks (reference input set)105 multiprogram workloads

From 8-application combinations on 4 cores to 20-applicationcombinations on 10 cores

Metrics:System throughput (STP), by means of the weighted speedupmetricSystem fairness, Unfairness = Max Slowdowni

Min Slowdownj∀{i , j} ∈ {1,N}

Four schedulers are compared:RandomLinux, default Completely Fair Scheduler (CFS)L1-bandwidth aware scheduler, which balances the L1bandwidth utilization among cores. Feliu et al., PACT’13Symbiotic scheduler

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 18 / 24

Page 59: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

System throughput increase

0%

2%

4%

6%

8%

10%

12%

4 cores 5 cores 6 cores 7 cores 8 cores 9 cores 10 cores Avg

Syst

em t

hro

ugh

pu

t in

crea

se

Symbiotic scheduler Symbiotic scheduler overhead Linux scheduler L1-bandwidth aware scheduler

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 19 / 24

Page 60: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

System unfairness

1.0

1.3

1.6

1.9

2.2

2.5

2.8

3.1

4 cores 5 cores 6 cores 7 cores 8 cores 9 cores 10 cores Avg

Un

fair

nes

s

Random scheduler Linux scheduler

L1-bandwidth aware scheduler Symbiotic scheduler

Unfairness is a lower is better metricJ. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 20 / 24

Page 61: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Symbiosis patterns

App1App2App3App4App5App6App7App8App9App10

App1

App2

App3

App4

App5

App6

App7

App8

App9

App10

App1App2App3App4App5App6App7App8App9App10

App1

App2

App3

App4

App5

App6

App7

App8

App9

App10

Workload 5_3 Workload 5_4

Frequency matrices of the job co-schedulesThe darker the color the more frequently the couple isscheduled together on a SMT core

In workload 5 4 two couples are scheduled very frequently(> 65%) ⇒ High symbiosisIn workload 5 3 there is not that predominant couple ⇒ Highphase behavior

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 21 / 24

Page 62: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Symbiosis patterns

App1App2App3App4App5App6App7App8App9App10

App1

App2

App3

App4

App5

App6

App7

App8

App9

App10

App1App2App3App4App5App6App7App8App9App10

App1

App2

App3

App4

App5

App6

App7

App8

App9

App10

Workload 5_3 Workload 5_4

Frequency matrices of the job co-schedulesThe darker the color the more frequently the couple isscheduled together on a SMT coreIn workload 5 4 two couples are scheduled very frequently(> 65%) ⇒ High symbiosisIn workload 5 3 there is not that predominant couple ⇒ Highphase behavior

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 21 / 24

Page 63: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Outline

1 Introduction

2 Predicting Job Symbiosis

3 SMT Interference-Aware Scheduler

4 Experimental Evaluation

5 Conclusions

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 22 / 24

Page 64: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Conclusions

Scheduling has considerable impact on the performance ofSMT multicores

Novel symbiotic job scheduler for SMT multicoresQuick estimation of the performance of schedules to select theoptimal oneUsing CPI stacks can quickly adapt to phase behaviorNo need of additional hardware nor sampling schedules

Improve the system throughput of the random and Linuxschedulers, on average, by 10.3% and 4.7%

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 23 / 24

Page 65: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Symbiotic Job Scheduling on the IBM POWER8

J. Feliu1 S. Eyerman2 J. Sahuquillo1 S. Petit1

1Department of Computing Engineering (DISCA)Universitat Politecnica de Valencia

[email protected], {jsahuqui,spetit}@disca.upv.es

2Intel [email protected]

March 16th, 2016

2This work was done while Stijn Eyerman was at Ghent UniversityJ. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 24 / 24

Page 66: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Model accuracy

0%

4%

8%

12%

16%

Freq

uen

cy

Error intervals

0%

4%

8%

12%

16%

Freq

uen

cy

Error intervals

Distribution of error fromST CPI stacks alone toSMT CPI stacks togetherAverage absolute error:12.3%

Distribution of error fromSMT CPI stacks togetherto ST CPI stacks aloneAverage absolute error:13.4%

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 24 / 24

Page 67: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Obtaining ST CPI stacks in SMT mode

Approximate approachSMT components normalized to SMT CPI ≈ ST componentsnormalized to ST CPI

Both add to oneIf the relative increase of the components is the same, thenboth stacks are equal

However, it is not accurate enough

Estimate the slowdown applying the model to the estimatednormalized ST CPIRenormalize the measured SMT CPI stacks using theestimated slowdownApply the inverse model to obtain new estimates for the STCPI stacks

0

0.5

1

1.5

2

2.5

3

App 1 App 2

CPI

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

App 1 App 2 App 1 App 2(a) Measured SMT

CPI stacks(b) Normalized SMT

CPI stacks

(e) Predicted normalized ST CPI stacks

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 24 / 24

Page 68: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Obtaining ST CPI stacks in SMT mode

Approximate approachSMT components normalized to SMT CPI ≈ ST componentsnormalized to ST CPIEstimate the slowdown applying the model to the estimatednormalized ST CPI

Renormalize the measured SMT CPI stacks using theestimated slowdownApply the inverse model to obtain new estimates for the STCPI stacks

0

0.5

1

1.5

2

2.5

3

0

0.2

0.4

0.6

0.8

1

1.2

1.4

App 1 App 2

forward

model

CPI

0

0.2

0.4

0.6

0.8

1

App 1 App 2(a) Measured SMT

CPI stacks

App 1 App 2

(b) Normalized SMT CPI stacks

(c) Predicted normalized SMT CPI stacks

estimated slowdown

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 24 / 24

Page 69: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Obtaining ST CPI stacks in SMT mode

Approximate approachSMT components normalized to SMT CPI ≈ ST componentsnormalized to ST CPIEstimate the slowdown applying the model to the estimatednormalized ST CPIRenormalize the measured SMT CPI stacks using theestimated slowdown

Apply the inverse model to obtain new estimates for the STCPI stacks

0

0.5

1

1.5

2

2.5

3

0

0.2

0.4

0.6

0.8

1

1.2

1.4

App 1 App 2

forward

model

CPI

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

1.2

1.4

App 1 App 2 App 1 App 2(a) Measured SMT

CPI stacks

App 1 App 2

(b) Normalized SMT CPI stacks

(c) Predicted normalized SMT CPI stacks

(d) Adjusted normalized SMT CPI stacks

estimated slowdown

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 24 / 24

Page 70: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Obtaining ST CPI stacks in SMT mode

Approximate approachSMT components normalized to SMT CPI ≈ ST componentsnormalized to ST CPIEstimate the slowdown applying the model to the estimatednormalized ST CPIRenormalize the measured SMT CPI stacks using theestimated slowdownApply the inverse model to obtain new estimates for the STCPI stacks

0

0.5

1

1.5

2

2.5

3

0

0.2

0.4

0.6

0.8

1

1.2

1.4

App 1 App 2

forward

model

CPI

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

1.2

1.4

App 1 App 2 0

0.2

0.4

0.6

0.8

1

App 1 App 2

inverse

model

App 1 App 2(a) Measured SMT

CPI stacks

App 1 App 2

(b) Normalized SMT CPI stacks

(c) Predicted normalized SMT CPI stacks

(d) Adjusted normalized SMT CPI stacks

(e) Predicted normalized ST CPI stacks

estimated slowdown

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 24 / 24

Page 71: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

System throughput performance

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 24 / 24

0%

3%

6%

9%

12%

15%

18%

4_1

4_2

4_3

4_4

4_5

4_6

4_7

4_8

4_9

4_1

04

_11

4_1

24

_13

4_1

44

_15

5_1

5_2

5_3

5_4

5_5

5_6

5_7

5_8

5_9

5_1

05

_11

5_1

25

_13

5_1

45

_15

6_1

6_2

6_3

6_4

6_5

6_6

6_7

6_8

6_9

6_1

06

_11

6_1

26

_13

6_1

46

_15

7_1

7_2

7_3

7_4

7_5

7_6

7_7

7_8

4 cores 5 cores 6 cores 7 cores

Spee

du

p

Linux scheduler L1-bandwidth aware scheduler Symbiotic scheduler

0%

3%

6%

9%

12%

15%

18%

7_9

7_1

0

7_1

1

7_1

2

7_1

3

7_1

4

7_1

5

8_1

8_2

8_3

8_4

8_5

8_6

8_7

8_8

8_9

8_1

0

8_1

1

8_1

2

8_1

3

8_1

4

8_1

5

9_1

9_2

9_3

9_4

9_5

9_6

9_7

9_8

9_9

9_1

0

9_1

1

9_1

2

9_1

3

9_1

4

9_1

5

10

_1

10_

2

10_

3

10_

4

10

_5

10

_6

10

_7

10

_8

10

_9

10

_10

10

_11

10

_12

10

_13

10_

14

10_

15

7 cores 8 cores 9 cores 10 cores

Spee

du

p

Page 72: Symbiotic Job Scheduling on the IBM POWER8personales.upv.es/jofepre/docs/HPCA_2016_slides.pdf · 2016-03-15 · Implemented and evaluated on the IBM POWER8 Averagesystem throughput

Introduction Predicting Job Symbiosis SMT Interference-Aware Scheduler Experimental Evaluation Conclusions

Core asymmetry

0%

3%

6%

9%

12%

15%

6 cores 7 cores 8 cores 9 cores 10 cores

Spee

du

p

Symbiotic schedulerSymbiotic scheduler aware of core asymmetry

J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit HPCA’16 @ Barcelona, Spain March 16th, 2016 24 / 24