Top Banner
Action IC0804 www.cost804.org
40

Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Apr 21, 2018

Download

Documents

duongtruc
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

Energy- and Thermal-aware

scheduling for datacenters

Georges Da Costa

Ljubljana WG meeting, 8th July, 2016

Action IC0804www.cost804.org

[email protected] 1/34

Page 2: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

Datacenters: a major ecological impact

Recent datacenter 40000 servers, 500000 services (virtual machines). Google,Facebook > 1million serversPower consumption also is large scale

2000 : 70 TWh2007 : 330 TWh, 2% CO2 worldproduction2011 : 6eme country from a powerconsumption point of view2020 : 1000 TWh

Increasing

2014 : 90% of datacenters ownersplan update before end of 2015

[email protected] 2/34

Page 3: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

Sustainable datacenters

Multi-layer approachHardware: Change servers and cooling system

If entropy is constant, theoretical consumption is 0

Applications: rewrite application using innovative paradigm∗orimproved libraryMiddleware: manages the datacenters

Middleware: minimum cost, maximum impact

OpenStack: 30% market share in 2014OpenSource solutions: 43% (+72% in 2 years)

∗ Georges et al. Exascale machines require new programming paradigms and runtimes, SFI journal, 2015

[email protected] 3/34

Page 4: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

Power and Energy are Unique

Temporal efects

Inertia linked to temperatureSwitching on/o� servers

Under- or Over-reservation

Cycles can be relevant

Non-linear e�ects

Electrical power equations

Feedback loops

Cooling systemViolaine et al., Thermal-aware cloud middleware to reduce cooling needs,

WETICE workshop, 2014

[email protected] 4/34

Page 5: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

Simplest (?) tool: Experiments

Simple experiment: Fast Fourier Transform(NPB)

100 runs using the same hardware (Grid'5000)

Large di�erences

Time: 12s, 7% (Std. Dev. 3.2s)Energy: 9.3kJ, 5.5% (3kJ)

For the same time, 167s, a di�erence of 4kJ

Time 6= Energy

162

164

166

168

170

172

174

164 166 168 170 172 174 176

Tem

ps

(s)

Energie (kJ)

Transformée de fourrier

[email protected] 5/34

Page 6: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

Simulation

Large number of simulators: SimGrid, DCWorms, CloudSim, ...

Needed speci�cations:

Models of cloud (migration, over-allocation of resources, federation†)DVFSPower consumptionTemperature

An evolving �eld

DVFS and �ne-grained cloud simulation in CloudSimThermal models in DCWorms∗

DVFS and energy in SimGrid

∗ Wojtek et al., Energy and thermal models for simulation of workload and resource management in computing systems, SMPT

journal, 2015. †Thiam et al., Cooperative Scheduling Anti-load balancing Algorithm for Cloud, CCTS workshop, 2013

[email protected] 6/34

Page 7: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

Exemple: adding DVFS in CloudSim

Originally a Grid Simulator

Great stability over time100% resource usage

DVFS leads to move internally events

Fine grained temporal management (1/10 s)

Tom et al., Energy-aware simulation with DVFS, SMPT journal, 2013

[email protected] 7/34

Page 8: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

Plan

1 Allocation

2 Using DAG for frequency scaling

3 Pro�le-based hardware recon�guration

4 HPC-aware DVFS

[email protected] 8/34

Page 9: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

Allocation using Genetic Algorithms

Chromosome = Allocation

First random population

For each iteration:

Mutation and recombinationSort using the �tness functionKeep the best and iterate

Fitness depends on the metric functions

Performance, Energy, Resilience, Dynamism

Tom et al., Quality of Service Modeling for Green

Scheduling in Clouds, SUSCOM journal, 2014

[email protected] 9/34

Page 10: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

Result for Genetic Algorithm

Each one is better in itsdomain (Energy)

GA_All Good overall

400 services on 110servers, (40s)

Taking a metric intoaccount matters!

[email protected] 10/34

Page 11: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

Fuzzy Greedy

Advantage of G.A.: Fitness function

Similar method for greedy algorithm:

Set of greedy algorithmsKeep the bestWhat is the best?

Multi-objective : Fuzzy∗

With thermal models of datacenters(D-Matrix)†

Optimal sur E relaché

Famille de Gloutons

Nouvel optimal

∗ Hong Yang et al., Multi-Objective Scheduling for Heterogeneous Server Systems with Machine Placement, CCGRID conference, 2014

† Hong Yang et al.,Energy-e�cient and thermal-aware resource management for heterogeneous datacenters, SUSCOM journal, [email protected] 11/34

Page 12: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

Plan

1 Allocation

2 Using DAG for frequency scaling

3 Pro�le-based hardware recon�guration

4 HPC-aware DVFS

[email protected] 12/34

Page 13: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

Using DAG for frequency scaling

Use external contextual informationExample DAG of tasks

[email protected] 13/34

Page 14: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

Coordination of frequency of servers

[email protected] 14/34

Page 15: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

Coordination of frequency of servers

Generalization toward thecritical path

[email protected] 14/34

Page 16: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

Action au niveau du noeud

[email protected] 15/34

Page 17: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

Action au niveau du noeud

Next step:

Switching on/o� serversTake into accounttemperature

[email protected] 15/34

Page 18: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

Plan

1 Allocation

2 Using DAG for frequency scaling

3 Pro�le-based hardware recon�guration

4 HPC-aware DVFS

[email protected] 16/34

Page 19: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

Pro�le-based hardware recon�guration

Coarse grained reaction at the level of a node

Change processor frequencyChange the hard-drive modeRecon�gure network card

Detection of current phase∗

React in function of current phase

Low impact on the global infrastructure

∗ Landry et al. Application-Agnostic Framework for Improving the Energy E�ciency of Multiple HPC Subsystems, PDP Conference,

2015

[email protected] 17/34

Page 20: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

Resource consumption for phase detec-tion

0

0.002

0.004

0.006

0.008

0.01

0.012

0 50 100 150 200 250 300 350

Co

un

ters

ac

ce

ss

ra

te

Time (s)

Idle

MG

BT

EP

IS CG

branch missescache references

cache misses

[email protected] 18/34

Page 21: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

Phase detection

0

2

4

6

8

10

0 50 100 150 200 250 300 350

phas

e id

time (s)

Idle

MG BT

EP IS CG

[email protected] 19/34

Page 22: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

Decision rules

Phase label Available recon�guration rules

compute-intensive switch o� memory banks; put hard-drive in sleep mode;

processor at maximum frequency;

put network interface cards in sleep mode.

memory-intensive slow down processor frequency; put hard-drive in sleep mode;

or reduce its speed; switch on all memory banks.

mixed switch on all memory banks; increase processor frequency;

put hard-drive in sleep mode;

put network interface cards in sleep mode.

communication switch o� memory banks; slow down processor frequency;

intensive switch on hard-drives.

IO-intensive switch o� memory banks; slow down processor frequency;

put hard-drives in performance mode.

[email protected] 20/34

Page 23: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

Energy and performance28 servers

-20 %

-15 %

-10 %

-5 %

0 %

5 %

10 %

CG MG POP X1 GeneHunter WRF MDS

Ener

gy c

onsu

mpt

ion

/ ext

ra e

xecu

tion

time

Energy consumption Execution time

Landry et al., Exploiting performance counters to predict and improve energy performance of HPC systems, FGCS journal, [email protected] 21/34

Page 24: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

External phase detection

Obtaining system values is intrusive

Reducing number of monitored values reducesthe overhead

Monitoring external values (power, network)

Use statistical tools

Evaluate the behavior over time

Georges et al., Characterizing applications from power consumption : A case

study for HPC benchmarks, ICT-GLOW Symposium, 2011

[email protected] 22/34

Page 25: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

External phase detection

Obtaining system values is intrusive

Reducing number of monitored values reducesthe overhead

Monitoring external values (power, network)

Use statistical tools

Evaluate the behavior over time

Georges et al., Characterizing applications from power consumption : A case

study for HPC benchmarks, ICT-GLOW Symposium, 2011

0

2e+07

4e+07

6e+07

8e+07

1e+08

1.2e+08

0 10 20 30 40 50 60 70 80

Nu

mb

er

of

byte

s s

en

t p

er

se

co

nd

time (s)

benchmark CG (NPB)

[email protected] 22/34

Page 26: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

External phase detection

Obtaining system values is intrusive

Reducing number of monitored values reducesthe overhead

Monitoring external values (power, network)

Use statistical tools

Evaluate the behavior over time

Georges et al., Characterizing applications from power consumption : A case

study for HPC benchmarks, ICT-GLOW Symposium, 2011

0

2e+07

4e+07

6e+07

8e+07

1e+08

1.2e+08

0 50 100 150 200 250 300 350

Nu

mb

er

of

byte

s s

en

t p

er

se

co

nd

time (s)

benchmark SP (NPB)

[email protected] 22/34

Page 27: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

External phase detection

Obtaining system values is intrusive

Reducing number of monitored values reducesthe overhead

Monitoring external values (power, network)

Use statistical tools

Evaluate the behavior over time

Georges et al., Characterizing applications from power consumption : A case

study for HPC benchmarks, ICT-GLOW Symposium, 2011

[email protected] 22/34

Page 28: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

Plan

1 Allocation

2 Using DAG for frequency scaling

3 Pro�le-based hardware recon�guration

4 HPC-aware DVFS

[email protected] 23/34

Page 29: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

HPC-aware DVFS

Relative values between performance and ondemand DVFS

Benchmark FT SP BT EP LU IS CG

Time increase (%) 0 -3 -1 1 -2 2 0Energy increase (%) 0 -3 -1 -1 -2 -1 -1

HPC applications are never in Idle mode... Surprise !

MPI libraries are doing some pooling

Classical HPC benchmarks from NPB (Nas Parallel Benchmark)

[email protected] 24/34

Page 30: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

DVFS using only processor load

80

90

100

110

120

130

140

150

FT SP BT EP LU IS CG

meta_sched2_0.05smart3

meta_schedondemand

meta_sched2_0.01meta_sched2_1

smart2_0.5conservative

smart2_0.2smart2_0.01

smart2_0.1meta_sched3smart2_0.05performance

meta_sched2_0.2meta_sched2_0.1

smart2_1powersave

meta_sched2_0.5smart

[email protected] 25/34

Page 31: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

Yet DVFS has potential

Relative values between performance and powersave

Benchmark FT SP BT EP LU IS CG

Time increase (%) 36 69 110 159 96 35 83Energy increase (%) -18 2 21 50 16 -19 7

Time increases but up to 19% of reduction of energy consumption!

[email protected] 26/34

Page 32: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

HPC hypotheses

State of applications at any timeComputingCommunicationsDisk I/OIdle

[email protected] 27/34

Page 33: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

HPC hypotheses

State of applications at any timeComputingCommunications

Disk I/OIdle

[email protected] 27/34

Page 34: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

Decision

Energy for max frequency(α+ β)P1

Energy for min frequency(λα+ β)P2

It is interesting to stay at max frequency if we consume less energy:

(α+ β)P1 < (λα+ β)P2

[email protected] 28/34

Page 35: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

Obtaining α and β

Di�cult to measure them directly

We aim at runtime, not code instrumentation

Easy to measure bandwidth (where Bm is the maximum bandwidth)

Bw = Bmβ

α+ β

Actually α and β are not importantαβ is, i.e. ratio between time to compute and time to communicate

[email protected] 29/34

Page 36: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

The great mix

Mix and serve

Bw <Bm

λ− 1(λ− P1

P2) = B1

B1 : Bandwidth threshold at max frequency to change frequency

The other way around

B2 =Bm

λ− 1(λ

P2

P1− 1)

B2 : Bandwidth threshold at min frequency to change frequency

[email protected] 30/34

Page 37: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

With an hysteresis for inertia

Algorithm NetSched

Each .1 second, do:If Current_Frequency = Slowest frequency and IBR ≤ .9B1

Change frequency to Fastest

If Current_Frequency = Fastest frequency and IBR ≥ 1.1B2

Change frequency to Slowest

IBR : Incoming Byte Rate

[email protected] 31/34

Page 38: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

Experimental environment

Servers (thanks Grid5000)

Processors : bi Dual-Core AMD Opteron (2218)Memory : 8GBNic : Gigabyte EthernetFrequency : 2.6GHz and 1GHzElectrical power: P1 = 280W et P2 = 152W

Benchmark

7 Nas Parallel Benchmark (NPB)

Governors

Performance/Powersave/OndemandNetSched

1.1B1 ' 7.107 and 0.9B2 ' 3.107

[email protected] 32/34

Page 39: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

Makespan and Energy-to-solution

80

100

120

140

160

180

200

220

240

260

IS FT SP CG LU BT EP

Mak

espa

n (in

% o

f per

form

ance

)

performancepowersavenet_schedondemand

70

80

90

100

110

120

130

140

150

160

IS FT SP CG LU BT EP

Ener

gy (i

n %

of p

erfo

rman

ce)

performancepowersavenet_schedondemand

∗ Georges Da Costa et al., DVFS governor for HPC: Higher, Faster, Greener, Euromicro PDP conference, 2015

[email protected] 33/34

Page 40: Energy- and Thermal-aware scheduling for … le-basedDVFS-based Energy- and Thermal-aware scheduling for datacenters Georges Da Costa Ljubljana WG meeting, 8th July, 2016 Action IC0804

Allocation DAG-based Pro�le-based DVFS-based

Conclusion

Allocation : Genetic Algorithm∗, Vector packing or Fuzzy

Up to 30% power consumption reduction

Using DAG for frequency scaling�

Up to 13% power consumption reduction

Pro�le-based hardware recon�guration†

Up to 13% power consumption reduction, 3% of makespan increase

HPC-Aware DVFS‡

Up to 25% power consumption reduction, 1% of makespan decrease!

∗ Tom et al., Quality of Service Modeling for Green Scheduling in Clouds, SUSCOM journal, 2014 �Tom et al., Energy-aware

simulation with DVFS, SMPT journal, 2013 †Landry et al., Exploiting performance counters to predict and improve energy performance

of HPC systems, SUSCOM journal, 2014 ‡Georges et al., DVFS governor for HPC: Higher, Faster, Greener, PDP conference, 2015

[email protected] 34/34