Top Banner
Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay
46

Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

Dec 24, 2015

Download

Documents

Reynold Summers
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

Energy and Power

Lecture notes S. Yalamanchili and S. Mukhopadhyay

Page 2: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(2)

Some Useful Reading

• http://en.wikipedia.org/wiki/CPU_power_dissipation

• http://en.wikipedia.org/wiki/CMOS#Power:_switching_and_leakage

• http://www.xbitlabs.com/articles/cpu/display/core-i5-2500t-2390t-i3-2100t-pentium-g620t.html

• http://www.cpu-world.com/info/charts.html

Page 3: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(3)

Historical Scaling

Page 4: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(4)

Technology Scaling

• 30% scaling down in dimensions doubles transistor density

• Power per transistor Vdd scaling lower power

• Transistor delay = Cgate Vdd/ISAT Cgate, Vdd scaling lower delay

GATE

SOURCE

BODY

DRAIN

tox

GATE

SOURCE DRAIN

L

leakddstdddd IVIVfCVP 2

Page 5: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(5)

Fundamental Trends

High Volume Manufacturing

2004 2006 2008 2010 2012 2014 2016 2018

Technology Node (nm)

90 65 45 32 22 16 11 8

Integration Capacity (BT)

2 4 8 16 32 64 128 256

Delay = CV/I scaling 0.7 ~0.7 >0.7 Delay scaling will slow down

Energy/Logic Op scaling

>0.35 >0.5 >0.5 Energy scaling will slow down

Bulk Planar CMOS High Probability Low Probability

Alternate, 3G etc Low Probability High Probability

Variability Medium High Very High

ILD (K) ~3 <3 Reduce slowly towards 2-2.5

RC Delay 1 1 1 1 1 1 1 1

Metal Layers 6-7 7-8 8-9 0.5 to 1 layer per generation

Source: Shekhar Borkar, Intel Corp.

Page 6: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(6)

ITRS Roadmap for Logic Devices

From: “ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems,” P. Kogge, et.al, 2008

Page 7: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(7)

Where Does the Power Go in CMOS?

• Dynamic Power Consumption Charging and discharging capacitance

• Short Circuit Power Short circuit path between supply rails during

switching Nominally 10%-20% of dynamic power and can be

ignored for a first order analysis

• Leakage Leaky transistors

Page 8: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(8)

Dynamic Power

PDYNAMIC = CL x VDD x VDD x Frequency

Time

VDD

Voltage

0 T

VDD VDD

Output Capacitor Charging

Output Capacitor

Discharging

Input to CMOS

inverter

iDD

iDDCLCL

• Dynamic power is used in charging and discharging the capacitances in the CMOS circuit.

Page 9: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(9)

• Technology scaling has caused transistors to become smaller and smaller. As a result, static power has become a substantial portion of the total power.

Static Power

Gate Leakage

Junction Leakage Sub-threshold Leakage

Input = 0 Output = VDD

PSTATIC = VDD x ISTATIC

Page 10: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(10)

EnergyDelay

Ene

rgy

or d

elay

VDDVDD

ED

P

Energy-Delay Interaction

• Delay decreases with supply voltage but energy/power increases

Page 11: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(11)

leak

age

or d

elay

Vth

leakagedelay

Static Energy-Delay Interaction

• Static energy increases exponentially with decrease in threshold voltage

• Delay increases with threshold voltage

tox

SOURCE DRAIN

L

GATE

Page 12: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(12)

Power Vs. Energy

• Energy is a rate of expenditure of energy One joule/sec = one watt

• Both profiles use the same amount of energy at different rates or power

Pow

er(

watt

s)

P0

P1

P2

Same Energy = area under the curve

Pow

er(

watt

s)

Time

P0

Time

Page 13: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(13)

Optimizing Power vs. Energy

Thermal envelopes minimize peak power

Maximize battery life minimize energy

Page 14: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(14)

The Problem

• Historically performance scaling was accompanied by power scaling

• This is no longer true power densities are increasing

Page 15: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(15)

The End of Dennard Scaling

tox

SOURCE DRAIN

L

GATE

• Voltage is no longer scaling at the same rate

• Slower scaling in power per transistor increasing power densities

From R. Dennard, et al., “Design of ion-implanted MOSFETs with very small physical dimensions,” IEEE Journal of Solid State Circuits, vol. SC-9, no. 5, pp. 256-268, Oct. 1974.

Page 16: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(16)

Chip Power Densities

From: “ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems,” P. Kogge, et.al, 2008

Page 17: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(17)

Mukhopadhyay and Yalamanchili (2009)

Based on scaling using Pentium-class cores While Moore’s Law continues, scaling phenomena have

changed Power densities are increasing with each generation

17

What is the Problem?

Page 18: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(18)

The Power Wall

• Power per transistor scales with frequency but also scales with Vdd

Lower Vdd can be compensated for with increased pipelining to keep throughput constant

Power per transistor is not same as power per area power density is the problem!

Multiple units can be run at lower frequencies to keep throughput constant, while saving power

leakddstdddd IVIVfCVP 2

Page 19: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(19)

The Advent of Dark Silicon?

64-core asymmetric chip multiprocessor layoutand failure probability distribution

In-order core Out of-order core • Cannot afford to turn on all devices at once

• How do we manage the power and thermals?

Page 20: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(20)

What are my Options?

1. Better technology Manufacturing New Devices non-CMOS?

2. Be more efficient – activity management Clock gating Power gating Power management

3. Improved architecture Simpler pipelines

4. Parallelism

Page 21: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(21)

Activity Management

• Turn off clock to a block of logic

• Eliminate unnecessary transitions/activity

• Clock distribution power

• Turn off power to a block of logic, e.g., core

• No leakage

Combinational Logic

clk

clk

cond

input

clk

   

Core 0 Core 1

VddPower gate transistor

Clock Gating Power Gating

Page 22: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(22)

Power Management

• Software controlled power management Optimize power and/or energy Orchestrated by the operating system or application

libraries Industry standard interfaces for power management

o Advanced Configuration and Power Interface (ACPI) https://www.acpica.org/ http://www.acpi.info/

• Hardware power management Optimized power/energy Failsafe operation, e.g., protect against thermal

emergencies

Page 23: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(23)

Processor Power States

• Performance States – P-states Operate at different voltage/frequencies

o Recall delay-voltage relationship Lower voltage lower leakage Lower frequency lower power (not the same as energy!) Lower frequency longer execution time

• Idle States - C-states Sleep states Differ is how much state is saved

• SW or HW managed transitions between states!

Page 24: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(24)

Multiple Voltage Frequency Domains

From E. Rotem et. Al. HotChips 2011

• Cores and ring in one DVFS domain• Graphics unit in another DVFS domain• Cores and portion of cache can be gated

off

Intel Sandy Bridge Processor

Page 25: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(25)

Power States

From: http://www.intel.com/content/www/us/en/processors/core/2nd-gen-core-family-mobile-vol-1-datasheet.html

Page 26: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(26)

Power Gating

Intel Sandy Bridge Processor

• Turn off components that are not being used Lose all state information

• Costs of powering down

• Costs of powering up

• Smart shutdown Models to guide decisions

Page 27: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(27)

Simplify Core DesignAMD Bulldozer Core

ARM A7 Core (arm.com)

• Support for out of order execution, schedulers, branch prediction, etc. consumes more energy per instruction

• Can fit many more simpler cores on a dies

Page 28: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(28)

Parallelism and PowerIBM Power5

Source: IBM

AMD Trinity

Source: forwardthinking.pcmag.com

• How much of the chip area is devoted to compute?

• Run many cores slower. Why does this reduce power?

Page 29: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(29)

Parallelism

• Concurrency + lower frequency greater energy efficiency

leakddstdddd IVIVfCVP 2

Core

Cache

Core

Cache

Core

Cache

Core

Cache

Core

Cache

• 4X #cores• 0.75x voltage• 0.5x Frequency• 1X power• 2X in performance

Example

Page 30: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(30)

Microarchitectural Level Models

• How can we study power consumption without building circuits? Models

• Models can are available at multiple levels of abstraction.

We are interested in microarchitectural models

Page 31: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(31)

Processor Microarchitecture

Instruction Cache

Instruction Queue

FetchQueue

Instruction Decoder

BranchPrediction

Register Files

Instruction TLB

ALU

MUL

FPU

LD

ST

L1 Data Cache

DataTLB

L2 Data CacheNoC Router

On-ChipNetwork

Fetch Decode Execute/Writeback

Memory

Network

Page 32: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(32)

Energy/Power Calculation

• How do we calculate energy or power dissipation for a given microarchitecture?

• Energy/Power varies between: Different ISA; ARM vs Intel x86

Different microarchitecture; in-order vs out-of-order

Different applications; memory vs compute-bound

Different technologies; 90nm vs 22nm technology

Different operation conditions; frequency, temperature

Page 33: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(33)

Architecture Activity (1)

Instruction Cache

Instruction Queue

FetchQueue

Instruction Decoder

BranchPrediction

Register Files

Instruction TLB

ALU

MUL

FPU

LD

ST

L1 Data Cache

DataTLB

L2 Data CacheNoC Router

On-ChipNetwork

Activity 1: Instruction Fetch

icache.read++; fbuffer.write++;

• Collect activity counts of each architecture component (through simulation or measurement).

• List of components differs between microarchitectures.

• Activity counts at each component differs between applications.

Page 34: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(34)

Architecture Activity (2)

Instruction Cache

Instruction Queue

FetchQueue

Instruction Decoder

BranchPrediction

Register Files

Instruction TLB

ALU

MUL

FPU

LD

ST

L1 Data Cache

DataTLB

L2 Data CacheNoC Router

On-ChipNetwork

Activity 2: Instruction Decode

fbuffer.read++; idecoder.logic++;

• Read/write accesses to caches, buffers, etc.

• Logical accesses to logic blocks such as decoder, ALUs, etc.

• Tradeoff of differentiating more access types (accuracy) vs simulation speed (complexity).

Page 35: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(35)

Power and Architecture Activity

• For example, At nth clock cycle, collected counters are: Data cache:

o read = 20, write = 12;

o per-read energy = 0.5nJ; per-write energy = 0.6nJ;

o Read energy = read*per-read energy = 10nJ

o Write energy = write*per-write energy = 7.2nJ

o Total activity energy = read+write energies = 17.2nJ

o If n = 50th clock cycle and clock frequency = 2GHz,Total activity power = energy*clock_freq/n = 688mW

*Note: n/clock_freq = n clock periods in sec power = time average of energy

Page 36: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(36)

Things to consider (1)

1. How do we calculate per-read/write energies?

• Per-access energies can be estimated from circuit-level designs and analyses.

• There are various open-source tools for this.

Architecture Specification

Technology Parameters

Circuit-levelEstimation

Tool

Estimation Results:Area, Energy, Timing, etc.

Page 37: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(37)

Things to consider (2)

2. Is per-access energy always the same?

• Per-access energy in fact depends on:• how many bits are switching • how they are switching (0→1 or 1→0)

• It is reasonable to assume constant per-access energy in long-term observation (e.g., n = 1M clock cycles); the number of switching bits are averaged (e.g., 50% of bits are switching).

• Most architecture simulators do not capture bit-level details due to simulation complexity.

Page 38: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(38)

Things to consider (3)

3. If a register file didn’t have read/write accesses but held data, what is the energy dissipation?

• Energy (or power) is largely comprised of dynamic and static dissipations.

• Dynamic (or switching) energy refers to energy dissipation due to switching activities.

• Static (or leakage) energy is dissipation to keep the electronic system turned on.

• In this case, the register file has no dynamic energy dissipation but consumes static energy.

Page 39: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(39)

Thermal Issues

• Heat can cause damage to the chip Need failsafe operation

• Thermal fields change the physical characteristics Leakage current and therefore power increases Delay increases Device degradation becomes worse

• Cooling solution determines the permitted power dissipation

Page 40: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(40)

Thermal Design Power (TDP)

• This is the maximum power at which the part is designed to operate Dictates the design of the

cooling system o Max temperature Tjmax

Typically fixed by worst case workload

• Parts are typically operating below the TDP

• Opportunities for turbo mode?

AMD Trinity APU

http://ecs.vancouver.wsu.edu/thermofluids-research

Page 41: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(41)

Trinity TDP

Source: http://www.anandtech.com/show/6347/amd-a10-5800k-a8-5600k-review-trinity-on-the-desktop-part-2

Page 42: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(42)

Exploiting the Physics

• Most of time the part is operating well below its thermal limit Leaving performance on the table

• Can temporarily boost frequency (and therefore power dissipation) for short periods of time, e.g., seconds

• Temperature changes slowly

Page 43: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(43)

Boosting

• Exploit package physics Temperature changes on the

order of milliseconds

• Use the thermal headroom

Max Power

TDP Power

Low power – build up thermal credits

Turbo boost region

10s of seconds

Intel Sandy Bridge

Page 44: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(44)

Conclusions

• Power/energy is the leading driver of modern architecture design

• Power and energy management is key to scalability

• Need integrated power/energy, performance, thermal management in fielded systems

• What about energy/power efficient algorithms?

Page 45: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(45)

Study Guide

• Explain the difference between energy dissipation and power dissipation

• Distinguish between static power dissipation and dynamic power dissipation

• Be able to apply the simplified McPAT power model to a simple datapath and instruction sequence

• Explain dynamic voltage frequency scaling What are power states? Why is this an advantage? What is the impact of DVFS on i) energy, ii) execution

time, and iii) power

Page 46: Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

(46)

Study Guide (cont.)

• How is thermal design power (TDP) calculated?

• When using boost algorithms, what determines the duration of the high frequency operation?

• How does a power virus work?

• Describe how throttling works

• Know the power dissipation in some modern processor-memory systems drawn from the embedded, server, and high performance computing segments