Scaling, Power and the Future of CMOS - Semantic Scholar › ecb5 › b230c47f5109aafce... · 2018-10-18 · Scaling, Power and the Future of CMOS Mark Horowitz, Elad Alon, Dinesh

Scaling, Power

and the Future of CMOS

Mark Horowitz, Elad Alon, Dinesh Patil, Stanford

Samuel Naffziger, Rajesh Kumar, Intel

Kerry Bernstein, IBM

Slide 2

A Long Time Ago

In a building far away

A man made a prediction

On surprisingly little data

That has defined an industry

Slide 3

Moore’s Law

Slide 4

CMOS Computer Performance

1.00

10.00

100.00

1000.00

10000.00

85 87 89 91 93 95 97 99 01 03 05 07

intel 386

intel 486intel pentium

intel pentium 2

intel pentium 3

intel pentium 4intel itanium

Alpha 21064

Alpha 21164

Alpha 21264Sparc

SuperSparc

Sparc64Mips

HP PA

Power PC

AMD K6AMD K7

AMD x86-64

Slide 5

Moore’s Original Issues

• Design cost

• Power dissipation

• What to do with all the functionality possible

ftp://download.intel.com/research/silicon/moorespaper.pdf

Slide 6

Outline

• How designers will deal with poor power scaling

• Origins of the power problem

• An optimization perspective

• Low power circuits and architectures

• Cost of variability

• Future scenarios

• What device characteristics matter (to me)

Slide 7

The 80’s Power Problem

• Until mid 80s technology was mixed

• nMOS, bipolar, some CMOS

• Supply voltage was not scaling / power was rising

• nMOS, bipolar gates dissipate static power

From Roger Schmidt, IBM Corp

Slide 8

Solution: Move to CMOS

• And then scale Vdd

0.1

1

10

Jan-85 Jan-88 Jan-91 Jan-94 Jan-97 Jan-00 Jan-03

Feat Size (um)

Vdd

Slide 9

Scaling MOS Devices

• In this ideal scaling• V scales to αV, L scales to αL

• So C scales to αC, i scales to αi (i/µ is stable)

• Delay = CV/I scales as α

• Energy = CV2 scales as α3

JSSC Oct 74, pg 256

Slide 10

Processor Power

• Continued to grow, even when Vdd was scaled

1

10

100

85 87 89 91 93 95 97 99 01 03

Slide 11

Why Power Increased

• Growing die size, fast frequency scaling

Clock Frequency (MHz)

10

100

1000

10000

85 87 89 91 93 95 97 99 01 03 05

Slide 12

Good News

• Die growth & super frequency scaling have stopped

Cycle in FO4

10

100

85 87 89 91 93 95 97 99 01 03 05

Slide 13

Processor Power

• They were high power too

1

10

100

85 87 89 91 93 95 97 99 01 03

Slide 14

Bad News

• Voltage scaling has

stopped as well

• kT/q does not scale

• Vth scaling has

power consequences

• If Vdd does not scale

• Energy scales slowlyEd Nowak, IBM

Slide 15

Energy – Performance Space

• Every design is a point on a 2-D plane

Performance

Energy

Slide 16



Performance

Energy

Slide 17



Performance

Energy

Slide 18

Trade-offs for an Adder

101

102

103

104

105

Stat.Carr.Chain Stat.Carr.Sel

Stat. KS

Dom. BK

Dom. KS

Dom. LF

Stat. LF

Stat. BK

Stat. 84421

Slide 19

( )

*dd dd

dd

dd

dd V V

EV

Sens VDV

=

∂∂

= −∂

∂

Key Observation:

• Define the Energy/Delay sensitivity of parameter

• For example Vdd:

• At optimal point, all sensitivities should be the same

• Must equal the slope of the Pareto optimal curve

Slide 20

What This Means

• Vdd and Vth are not directly set by scaling

• Instead set by slope of Pareto optimal curve

• Leakage rose to lower total system power!

101

102

103

104

105

[Vdd=0.68, Vt

N=0.31]

[Vdd=1 Vt

N=0.30]

[Vdd=1.2 Vt

N=0.26]

[Vdd=1.3 Vt

N=0.22]

Slide 21

Low Power Design Techniques

Three main classes of methods to reduce energy:

• Cheating

• Reducing the performance of the design

• Reducing waste

• Stop using energy for stuff that does not produce results

• Stop waiting for stuff that you don’t need (parallelism)

• Problem reformulation

• Reduce work (less energy and less delay)

Slide 22

Cheating

• Many low-power papers talk only about energy

• Don’t consider performance

• Reducing performance can always reduce energy

• But there are many ways to reduce performance

• Good technique must lower the optimal curve

• “Sensitivity” of technique

• Must be better than current curve

• This depends on location on the curve

Slide 23

Reducing Energy Waste

• Clock gating

• If a section is idle, remove clock

• Removes clock power

• Prevents any internal node from transitioning

• Create system power states

• Turn on subsystems only when they are needed

• Can have different “off” states

• Power vs. wakeup time

• Disk (do you stop it from spinning?)

Slide 24

Embedded Power Gating

• Can reduce leakage• 250x reported

• But costs• Performance

• Drop in Vdd, Gnd

Embedded

Power

Switches

Rows of

Standard

Cells

Power Switch

Control Signals

• Since transistors still leak when power is off

Royannez, et al, 90nm Low Leakage SoC Design Techniques

for Wireless Applications, ISSCC 2005

Slide 25

Range of Applicability

• Power supply gating

• Done to remove leakage power

• But slows down the circuit

• Adds series resistance to the supply

Performance

Energy/op

Makes circuit worse

when energy

sensitivity is high

Slide 26

Parallelism

• If the application has data parallelism

• Parallelism is a way to improve performance

• With low additional energy cost

Slide 27

Existing Processors

0.01

0.1

1

1 10 100 1000Spec2000*L

Watts/(Spec*L*Vdd2)

10 processors

working in parallel

Slide 28

Parallel Server Chip

• Power 5 from IBM

Slide 29

Problem Reformulation

• Best way to save energy is to do less work

• Energy directly reduced by the reduction in work

• But required time for the function decreases as well

• Convert this into extra power gains

• Shifts the optimal curve down and to the right

User Performance

Energy/op

Slide 30

Cost of Variation

• Variability changes position of the optimal curves

• Need to margin Vth, Vdd to ensure circuit always works

10-1

100

10-1

100

101

Performance

Energy

∆ Vth = 0 mV

∆ Vth = 120 mV

Slide 31

Partial Compensation

• Adjust Vdd after you get part back

• Compensates very well for small deviations in Vth

10-1

100

10-1

100

101

Performance

Energy

∆ Vth = 0 mV

∆ Vth = 120 mV

Slide 32

Reducing Voltage Margins

• At test time determine Vdd for that part

• Have private DC-DC converter already

20%

Slide 33

Variable Application Demands

• Try to provide a couple of operating points

• Application can control speed and energy

• Hard question is what are valid Vdd, F pairs

• Usually determined during test

• Dynamic voltage scaling

• Intel Speed Step in laptop processors

• 2 performance/power points

• Transmeta Long Run Technology

• Many operating points. Test data + formula

Slide 34

Constant Power Scaling

• Foxton controller on next-gen. Itanium II

• Raises Vdd/boosts F when most units idle

• Lowers Vdd for parallel code to stay in budget

Slide 35

Self Checking Hardware

• Razor (Austin/Blaauw, U of Mich)

• Use the actual hardware to check for errors

• Latch the input data twice

• Once on the clock edge, and then a little later

• If the data is not the same, you are going too fast

Din 0

clk_del

Q[0]Din 1Din 2

error

clk

Error_L

comparator

RAZOR FF

Main

Flip-Flop

Shadow

Latch

01

Slide 36

With Error Recovery

• Can run the chip so it makes some errors

• Chip gets right answer 99.9% of the time

• 0.1% of the time, the chip must rerun operation

1.4 1.5 1.6 1.7 1.8

1.4

1.5

1.6

1.7

1.8 Chips

Linear Fit

y=0.78685x + 0.22117

Voltage at First FailureVoltage at 0.1%Error Rate

Slide 37

Adjusting Vth

• In theory want to adjust Vth too

• Very hard to do with modern transistors

0.9 0.95 1 1.05 1.1 1.15 1.2 1.25 1.35

10

1520

2530

3540

4550

5560

6570

Leakage vs Vdd with Different Body Bias

Vbb=0.8

Vbb=0.7

Vbb=0.6

Vbb=0.5

Vbb=0.4

Vbb=0.3

Vbb=0.2

Vbb=0.1

Vbb=0

Vdd (V)

Leakage (

A)

Slide 38

Future Systems

• Some simple math

• Assume scaling continues

• Dies don’t shrink in size

• Average power/gate must decrease by 2x / generation

• Since gates are shrinking in size

• Get 1.4x from capacitive reduction

• Where is the other factor of 1.4x ?

Slide 39

Exploit Parallelism / Scale Vdd

• If you have parallelism

• Add more function units

• Fill up new die (2x)

• Lower energy/op

• ∆E/∆P will decrease

• Vdd, sizes, etc will reduce

• Build simpler architectures

• Works well when ∆E/∆P is large

• Per unit performance decrease is small

Performance

Energy/op

Slide 40

Exploit Specialization

• Optimize execution units for different applications

• Reformulate the hardware to reduce needed work

• Can improve energy efficiency for a class of applications

• Stream / Vector processing is a current example

• Exploit locality, reuse

• High compute densityµ-controller

Clusters

SRFMem

ory System

HISC NI

Bill Dally et al, Stanford

Imagine

Slide 41

Exploit Integration

• If both those techniques don’t work

• Still can increase integration by at least 1.4x

• Moving units onto one chip

• Reduces the number of I/Os on system

• I/O can take significant power today

• Allows even larger integration

Slide 42

TI - OMAP2420

• Specialization

• And power domains• Most units are off

• OMAP 2420 • 5 Power Domains

• #1: MCU Core

• #2: DSP Core

• #3: Graphic Accelerator

• #4: Core + Periph.

• #5: Always On logicRoyannez, et al, 90nm Low Leakage SoC Design Techniques

for Wireless Applications, ISSCC 2005

Slide 43

Low-Power PowerPC

Nowka et al., Low-

power PowerPC,

ISSCC

Slide 44

What All This Means

• As long as $/function and cap continue to scale

• Moving to the new technology will be profitable

• And will allow designs to be better systems

• In the worst case, active die area will decrease

• Scale gates by the decrease in gate capacitance

• In most cases, we will do much better

• But how to optimize devices in this new domain?

Slide 45

Radical Idea:

• Scaling channel length may no longer be critical

• I still want small (i.e. dense) devices

• But I also want lower variations & external control of Vth

• Longer Leff may actually improve energy efficiency

• Less variability � lower energy penalty

• Especially as move to lower performance (parallelism)

10-1

100

100

101

Performance

Energy

120 mV110 mV

55 mV60 mV

Lnom

Lnom

+ 10%

Slide 46

Conclusions

• Unfortunately power is an old problem

• Magic bullets have mostly been spent

• Power will be addressed by application-level optimization, parallelism/specialized functional units, and more adaptive control

• Need to rethink scaling

• Still makes things cheaper

• But what do we want from scaled transistors?

Slide 47

1.5

Technology Scaling

Seems simple,

• Every 1 years

• Number of transistors double

• Transistors get faster

• Gates become lower power (CMOS)

• Life just gets better and better

2

Slide 48

Reality is a Little Different

• While scaling has been smooth

• Almost nothing else has been

• Device and circuit technology has changed

• DTL, ECL, TTL, pMOS, nMOS, CMOS

• Power periodically becomes a critical issue

• It is critical again

Slide 49

nMOS, TTL, ECL Were King

• 1978 – Started in VLSI

• First design was bipolar/ECL

• 3µm nMOS was hot

Intel 8086 Intel 286

DEC µVaxBIT Sparc HP Focus

Slide 50

MOS Scaling Was Understood

• MOS devices operate on electric fields

• If E fields are the same

• Relation between E and J is the same

• So if all voltages and lengths scale

• iV curve retains the same shape, scaled in V

Bob Dennard worked all the math in 74JSSC Oct 74, pg 256

Slide 51

Dilemma

• Processors today are power limited

• As are many other chips

• Technology scaling will not save us

• With Vdd fixed, energy scaling will be modest

• How does one build more powerful processors?

• Or other types of chips

When constrained, optimize!

Slide 52

Optimizing the Right Thing

• Given systems are power limited

• Highest performance system is not interesting

• Will dissipate too much power

• Lowest energy solution is also not interesting

• Will not have enough performance

• Want constrained optimization

• Highest performance for 20 Watts

• Lowest power for 100 SPEC

Slide 53

Leakage Trends

Gate Length (um)

Active Power Density

Subthreshold Power Density

Slide 54

Performance

Energy/op

Design Parameters To Adjust

• Circuit

(sizing, supply, threshold)

• Circuit topology

(adder: CLA, CSA, …)

• Logic style

(domino, pass-gate, …)

• Micro-architecture

(pipelining, cache design, branch architecture, etc)

Slide 55

Energy Efficient Designs

• Are on the Pareto optimal curve

• On this curve design parameters are constrained

Performance

Energy/op

infeasible

wasting energy

Slide 56

Leakage Energy

• Matching marginal costs for Vdd and Vth

10-2

10-1

0

0.1

0.2

0.3

0.4

0.5

Activity Factor

Leakage Ratio

10-2

10-1

0

0.2

0.4

0.6

0.8

1

1.2

Voltage

Vdd

Vth

Leakage Ratio

Slide 57

Measured Leakage Data

0

0.1

0.2

0.3

0.4

0.5

Leakage R

atio

Slide 58

IBM Cell Processor

Slide 59

Vth Variation

• Since leakage is exponential on Vth

• Average Vth for leakage is not the expected Vth

5 10 15 20 25 30 35 400

0.2

0.4

0.6

0.8

1

Relative Leakage

Cumulative Probability

0.2 0.3 0.4 0.50

0.5

1

1.5

2

2.5

Vth

Relative Leakage Contribution

Leakage

Vth

Slide 60

How Else to Save Energy?

• Running faster than needed wastes energy

• Forces you to run higher on performance curve

• Why do you run faster than needed?

• Need margins to account for variability

• From application, environment, or technology

Variations cause waste

Slide 61

Dynamic Voltage Scaling

Burd et al ISSCC 2000

Slide 62

Dynamic Voltage Scaling

• Dynamic voltage scaling

• Adjusts Vdd to the “right” value for desired performance

• Big problem is how to find the “right” Vdd

• Need to know the relationship between Vdd and F

• Need to have a circuit that matches the critical path

• How do you do this with variations?

Scaling, Power and the Future of CMOS - Semantic Scholar › ecb5 › b230c47f5109aafce... · 2018-10-18 · Scaling, Power and the Future of CMOS Mark Horowitz, Elad Alon, Dinesh

Documents