Parallelism and VLSI Group Prof. Dr. Jörg Keller Faculty of Mathematics and Computer Science Energy Challenges in Manycore Processors Jörg Keller
Parallelism and VLSI GroupProf. Dr. Jörg Keller
Faculty of Mathematics and Computer Science
Energy Challenges inManycore Processors
Jörg Keller
Faculty of Mathematics and Computer Science
Energy Challengesin ManycoreProcessors
Overview
Motivation
Power and Energy Basics
Optimization Targets
Case Study: Energy-efficient Task Scheduling
Outlook
Parallelism and VLSI GroupProf. Dr. Jörg Keller
Slide 2
Faculty of Mathematics and Computer Science
Energy Challengesin ManycoreProcessors
Motivation I
Energy consumption by computers accounts for more than 1 percent of total electric energy consumption in Germany
Energy is important cost factor in computing centers- direct energy cost- cost for cooling (most energy turned into heat)
Energy limits operating time of mobile deviceslaptops, tablets, smartphones, …
Slide 3Parallelism and VLSI Group
Prof. Dr. Jörg Keller
Faculty of Mathematics and Computer Science
Energy Challengesin ManycoreProcessors
Motivation II
Power consumption influences design and cost of embedded devices (fans, cooling,…)
Processors consume majority of energy in computers
Energy density might limit future processor development
Green IT has become a buzzword
SO LET‘s HAVE A LOOK INTO THIS TOPIC!
Slide 4Parallelism and VLSI Group
Prof. Dr. Jörg Keller
Faculty of Mathematics and Computer Science
Energy Challengesin ManycoreProcessors
Power and Energy Basics I
Microprocessor = complex mixture of combinational circuits, registers, and memories
Parts typically built from CMOS transistors
Transistors consume energy if they switch(dynamic power)
Additionally, static power consumption e.g. from leakage current
Slide 5 Parallelism and VLSI GroupProf. Dr. Jörg Keller
Faculty of Mathematics and Computer Science
Energy Challengesin ManycoreProcessors
Power and Energy Basics II
Dynamic power consumption per cycle depends linear on- number of transistors that switch- frequency (length of cycle)
Energy for transistor switch depends quadratic on voltage level
Static power from leakage depends linear on voltage
For given processor:Pd ~ f*V2 Ps ~ V
Slide 6 Parallelism and VLSI GroupProf. Dr. Jörg Keller
Faculty of Mathematics and Computer Science
Energy Challengesin ManycoreProcessors
Power and Energy Basics III
Voltage and frequency not independent!
Minimum voltage depends linear on frequency
Minimum voltage level for given frequency preferable:frequency defines performance, voltage does not
P = cd*f3 + cs*f
Consequence: static power (low order term) often ignored
Slide 7 Parallelism and VLSI GroupProf. Dr. Jörg Keller
Faculty of Mathematics and Computer Science
Energy Challengesin ManycoreProcessors
Power and Energy Basics III
Voltage and frequency not independent!
Minimum voltage depends linear on frequency[As long as voltage is sufficiently higher than threshold]
Minimum voltage level for given frequency preferable:frequency defines performance, voltage does not
P = cd*f3 + cs*f
Consequence: static power (low order term) often ignored
Slide 8 Parallelism and VLSI GroupProf. Dr. Jörg Keller
Faculty of Mathematics and Computer Science
Energy Challengesin ManycoreProcessors
Power and Energy Basics IV
Neglected influence factor: temperature
Example: Exynox 4412 quad-core ARM chipApplication: stress benchmark under Linux
thanks to S. Holmbacka, Abo Academy
Slide 9 Parallelism and VLSI GroupProf. Dr. Jörg Keller
Faculty of Mathematics and Computer Science
Energy Challengesin ManycoreProcessors
Power and Energy Basics IV
Neglected influence factor: temperature
Slide 10 Parallelism and VLSI GroupProf. Dr. Jörg Keller
Faculty of Mathematics and Computer Science
Energy Challengesin ManycoreProcessors
Power and Energy Basics V
Energy = ∫ power dt= power * time if power constant
If (run)time is fixed and power fix over time,then power and energy optimization go hand in hand
If not, complex optimization problem
Slide 11 Parallelism and VLSI GroupProf. Dr. Jörg Keller
Faculty of Mathematics and Computer Science
Energy Challengesin ManycoreProcessors
Optimization Targets I
Wide range of optimization targets
Single application: minimize runtime for given energy budget
Single application: minimize energy for given max runtime
System wide: maximize battery lifetime by power reduction without hurting performance goals
Both: minimize power for given response times
Slide 12 Parallelism and VLSI GroupProf. Dr. Jörg Keller
Faculty of Mathematics and Computer Science
Energy Challengesin ManycoreProcessors
Optimization Targets II
Deadline vs flowtime optimization
Static vs dynamic optimization
Application often knows about future behaviour
System mostly does not
Slide 13 Parallelism and VLSI GroupProf. Dr. Jörg Keller
Faculty of Mathematics and Computer Science
Energy Challengesin ManycoreProcessors
Optimization Targets III
What can be done:
frequency scalingcore switch off
both?
Slide 14 Parallelism and VLSI GroupProf. Dr. Jörg Keller
Faculty of Mathematics and Computer Science
Energy Challengesin ManycoreProcessors
Optimization Targets IV
Simple insights:
Given task with c instructions and deadline T:run it best on frequency f=c/T
Simulate f by surrounding discrete frequencies
Check for overhead!
Switch frequency alone: often 20 cyclesSwitch frequency+voltage: often as long as 0.2 millisec
Slide 15 Parallelism and VLSI GroupProf. Dr. Jörg Keller
Faculty of Mathematics and Computer Science
Energy Challengesin ManycoreProcessors
Optimization Targets V
Slide 16 Parallelism and VLSI GroupProf. Dr. Jörg Keller
Faculty of Mathematics and Computer Science
Energy Challengesin ManycoreProcessors
Optimization Targets VI
More simple insights:
Given task with c instructions, deadline T and perfectparallelism:run it best on p cores (as many as possible) at frequencyf=c/(Tp)
Static power consumption (and minimum frequency) limitsuseful number of cores:
E(p) = T*p*(c/(Tp))3 + T*p*Ps
Slide 17 Parallelism and VLSI GroupProf. Dr. Jörg Keller
Faculty of Mathematics and Computer Science
Energy Challengesin ManycoreProcessors
Optimization Targets VII
For non-divisible loads:balance load as evenly as possible to minimize La-Norm
Also here:load balancing is more difficult for discrete frequencies
Slide 18 Parallelism and VLSI GroupProf. Dr. Jörg Keller
Faculty of Mathematics and Computer Science
Energy Challengesin ManycoreProcessors
Optimization Targets VIII
Slide 19 Parallelism and VLSI GroupProf. Dr. Jörg Keller
Faculty of Mathematics and Computer Science
Energy Challengesin ManycoreProcessors
Optimization Targets IX
Not so simple insights:
If static power dominates, i.e. > 50% for typical frequenciesthen run fast and shutdown cores
Fast=often not highest frequency, but next to...
But consider time scale:core shut down: 0.2 – 6 milliseccore wake up: 2 – 90 millisec(Exynox 5, depending on workload)
Slide 20 Parallelism and VLSI GroupProf. Dr. Jörg Keller
Faculty of Mathematics and Computer Science
Energy Challengesin ManycoreProcessors
Case Study I
Given:set of n independent tasks with loads
machine with p cores, power consumption ~fa where a>1
Deadline M
Minimize energy
Slide 21 Parallelism and VLSI GroupProf. Dr. Jörg Keller
Faculty of Mathematics and Computer Science
Energy Challengesin ManycoreProcessors
Case Study II
Tasks are distributed over cores
Continuous frequency scaling (Pruhs et al.):core should run at single frequencyall cores should finish at makespan timeno core idle time between tasks
Core with load Li runs at fi such that Li/fi=M, i.e. fi=Li/Menergy consumption M*fi
a = M1-a*Lia
Distribute load such that la-norm minimized known problem with known heuristic
Slide 22 Parallelism and VLSI GroupProf. Dr. Jörg Keller
Faculty of Mathematics and Computer Science
Energy Challengesin ManycoreProcessors
Case Study III
Discrete frequency scaling (no overhead):Solve problem for continuous frequency scaling„simulate“ frequencies by surrounding discrete frequencies
Add frequency switching overhead changes situation
Counter example for divisible load(also possible for non-divisible load, see upcoming paper)
Slide 23 Parallelism and VLSI GroupProf. Dr. Jörg Keller
Faculty of Mathematics and Computer Science
Energy Challengesin ManycoreProcessors
Case Study IV
No consideration of frequency scaling overhead
Slide 24 Parallelism and VLSI GroupProf. Dr. Jörg Keller
Faculty of Mathematics and Computer Science
Energy Challengesin ManycoreProcessors
Case Study V
Suitable heuristic frequency scaling overhead: lower energy
Slide 25 Parallelism and VLSI GroupProf. Dr. Jörg Keller
Faculty of Mathematics and Computer Science
Energy Challengesin ManycoreProcessors
Case Study VI
Why look at independent tasks at all?
Many applications expressable as streaming task graphs or Kahn process networks
All tasks active, intermediate results forwarded (as packets) from task to task
If communications buffered:one scheduling round = task scheduling
Energy savings per round pay off!Slide 26 Parallelism and VLSI Group
Prof. Dr. Jörg Keller
Faculty of Mathematics and Computer Science
Energy Challengesin ManycoreProcessors
Case Study VII
What happens if tasks can be parallel as well? must choose degrees of parallelism for each
Theoretically optimal allocation: Sanders+Speck 2012
Simple: Eitschberger et al. 2013Parallelize all tasks or at least large ones for p cores balances load, saves energy despite efficiency loss
More advanced, still practical: Kessler et al. 2013Crown scheduling
Slide 27 Parallelism and VLSI GroupProf. Dr. Jörg Keller
Faculty of Mathematics and Computer Science
Energy Challengesin ManycoreProcessors
Outlook
Constant low voltage:frequency scaling loses importance for dynamic powerstatic power gets larger fraction sleep states gain importance for scheduling and
algorithmics
Similar tendencies in other components:idle power for servers might get low enough to avoid shutdown possible reversal of strategy in computing centers
Slide 28 Parallelism and VLSI Group
Jörg Keller
Faculty of Mathematics and Computer Science
Energy Challengesin ManycoreProcessors
Outlook
Microprocessors and operating systems should provide lower overhead for frequency/voltage scaling and core shut down
Better accessibility of features from applications:helpful for single-application embedded systems
Better understanding of heterogeneity and power
Slide 29 Parallelism and VLSI Group
Jörg Keller
Faculty of Mathematics and Computer Science
Energy Challengesin ManycoreProcessors
Outlook
Further miniaturization might increase energy density:only fraction of device can be active, „dark silicon“ question of special vs general purpose cores might get
new fuel
Computer architecture, parallel systems and parallel algorithmics still contain lots of research possibilities
Go ahead for your next paper, PhD degree, next grant,…
Slide 30 Parallelism and VLSI Group
Jörg Keller
Faculty of Mathematics and Computer Science
Energy Challengesin ManycoreProcessors
Thanks
for your
kind attention!
Slide 31 Parallelism and VLSI Group
Jörg Keller