Top Banner
Designing energyefficient microprocessor: How to fight process variations Ruzica Jevtic Berkeley Wireless Research Center
37

Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

Apr 02, 2018

Download

Documents

phungdang
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

Designing  energy-­‐‑efficient  microprocessor:  How  to  fight  

process  variations Ruzica Jevtic

Berkeley Wireless Research Center

Page 2: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

Moore’s  law

10

!""#$%&'()*'+ ,--.TransistorsTransistors

Per DiePer Die

101099

10101010

,./!,./!.0,!.0,!

0101 ,1,1

2-2-2-2- 2-2/2-2/2-,2/2-,2/

32/4'5#"6$&&"#32/4'5#"6$&&"#72/4'5#"6$&&"#72/4'5#"6$&&"#

5$89:;<5$89:;<=''=''5#"6$&&"#5#"6$&&"#5$89:;<5$89:;<='='>>'5#"6$&&"#>>'5#"6$&&"#

5$89:;<5$89:;<== >>>'5#"6$&&"#>>>'5#"6$&&"#5$89:;<5$89:;<== 7'5#"6$&&"#7'5#"6$&&"#

>9)8:;<>9)8:;<4'4'5#"6$&&"#5#"6$&&"#101088

101077

101066

101055

101044

1010332--22--2

>9)8:;<>9)8:;<4'4',, 5#"6$&&"#5#"6$&&"#

0?0?7?7?

/7?/7?,./?,./?

0!0!

0/!0/!7!7!

/7!/7!,./!,./!

0,2!0,2!

0/?0/?

0@

7--77--71010

101022

101011

101000

2--22--21965 Data (Moore)1965 Data (Moore)

MicroprocessorMicroprocessorMemoryMemory

19601960 19651965 19701970 19751975 19801980 19851985 19901990 19951995 20002000 20052005 20102010

Source: IntelSource: IntelGraph from S.Chou, ISSCC’2005

!""#$%&'A)*')8B'6"&9

,-

13

!"#$%&'()*+,-./0+12

34

)*01/-5(627()8920.:;(<:/-10000100001010

Nominal feature sizeNominal feature size

mm

10001000

100100

11

0.10.1

nmnm130nm130nm90nm90nm

70nm70nm50nm50nm

Gate LengthGate Length 65nm65nm45nm45nm

32nm32nm22nm22nm

0.7X every 2 years

180nm180nm250nm250nm

3=Source: Intel, IEDM presentations

10100.010.01

50nm50nm35nm35nm

19701970 19801980 19901990 20002000 20102010 20202020

~30nm~30nm

)8920.:;(>:/-(;-1>/8(?(1+@01:;(A-:/B*-(20C-(:A/-*(331@D

Scaling consequences: •  Process variations •  Power has become critical!

(c) Ruzica Jevtic 2012

Page 3: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

Scaling  consequences

•  Fabrication: - Litography with liquids

- Use of difraction…

•  Consequences: - Random dopant fluctuations - Hot-carrier trapping…

•  Additional steps: - Optical Proximity Check

•  Design rules: - No corners   - One direction lines…

J.  Hartmann,  ISSCC  2007 (c) Ruzica Jevtic 2012

Page 4: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

11

!"#$%&'(%!)&'*+,"-./0&$-1

23

+,"-./0&$-1*!)&4,'/10000100001010

Nominal feature size scaling

mm

10001000

100100

11

0.10.1

nmnm

130nm130nm90nm90nm

65nm65nm

180nm180nm250nm250nm

365nm365nm248nm248nm

193nm193nm

22

10100.010.0119701970 19801980 19901990 20002000 20102010 20202020

45nm45nm32nm32nm

22nm22nm EUV 13nmEUV 13nm

567*8 9#)-'.4./1*.:*"-#*:;";0#*<:.0#=#0>?

Litography

•  Litography issues: 193nm wavelength for

lines as small as 30nm!

12

!"#$%&'()(*+,-./0,-1+2&3-4/0+-,.3215(6,(7.,-21"+-.&.+&3

193nm lightMask

193nm light

Lightintensity

Lightintensity

89

!"#$%&'()(*+,-./0,-1+2&3-4

1CD kNA

:(62(&;(.Presently: 193 nm (ArF excimer laser)(Distant?) future: EUV

<*62(&;(.NA =.n;0*Maximum n is 1 in airPresently: ~0 92-1 35 min 1

1930.25 50

0 92nmCD k nm

NA

8>

Presently: 0.92-1.35Immersion

?(;"),@.!-20*A0*+.ABPresently: 0.35 – 0.4Theoretical limit: 0.25

0.92NA

45nm technology beyond resolution limit

(c) Ruzica Jevtic 2012

Page 5: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

Process  variations

•  Process  corners:  typical,  fast  and  slow •  All  corners  coexist  on  the  same  wafer •  If  we  design  for  the  worst  case,  it  is  too  pessimistic! •  Need  to  know  after  fab  chip  features        observation  circuits

B.  Nikolic,  TCAS-­‐‑I,  2011

(c) Ruzica Jevtic 2012

Page 6: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

Moore’s  law

10

!""#$%&'()*'+ ,--.TransistorsTransistors

Per DiePer Die

101099

10101010

,./!,./!.0,!.0,!

0101 ,1,1

2-2-2-2- 2-2/2-2/2-,2/2-,2/

32/4'5#"6$&&"#32/4'5#"6$&&"#72/4'5#"6$&&"#72/4'5#"6$&&"#

5$89:;<5$89:;<=''=''5#"6$&&"#5#"6$&&"#5$89:;<5$89:;<='='>>'5#"6$&&"#>>'5#"6$&&"#

5$89:;<5$89:;<== >>>'5#"6$&&"#>>>'5#"6$&&"#5$89:;<5$89:;<== 7'5#"6$&&"#7'5#"6$&&"#

>9)8:;<>9)8:;<4'4'5#"6$&&"#5#"6$&&"#101088

101077

101066

101055

101044

1010332--22--2

>9)8:;<>9)8:;<4'4',, 5#"6$&&"#5#"6$&&"#

0?0?7?7?

/7?/7?,./?,./?

0!0!

0/!0/!7!7!

/7!/7!,./!,./!

0,2!0,2!

0/?0/?

0@

7--77--71010

101022

101011

101000

2--22--21965 Data (Moore)1965 Data (Moore)

MicroprocessorMicroprocessorMemoryMemory

19601960 19651965 19701970 19751975 19801980 19851985 19901990 19951995 20002000 20052005 20102010

Source: IntelSource: IntelGraph from S.Chou, ISSCC’2005

!""#$%&'A)*')8B'6"&9

,-

13

!"#$%&'()*+,-./0+12

34

)*01/-5(627()8920.:;(<:/-10000100001010

Nominal feature sizeNominal feature size

mm

10001000

100100

11

0.10.1

nmnm130nm130nm90nm90nm

70nm70nm50nm50nm

Gate LengthGate Length 65nm65nm45nm45nm

32nm32nm22nm22nm

0.7X every 2 years

180nm180nm250nm250nm

3=Source: Intel, IEDM presentations

10100.010.01

50nm50nm35nm35nm

19701970 19801980 19901990 20002000 20102010 20202020

~30nm~30nm

)8920.:;(>:/-(;-1>/8(?(1+@01:;(A-:/B*-(20C-(:A/-*(331@D

Scaling consequences: •  Process variations •  Power has become critical!

(c) Ruzica Jevtic 2012

Page 7: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

Scaling  issues:  Power  

•  Operating voltage scaled to avoid device breakdown •  Not scaled enough to keep the performance boosting up

•  Power has become a critical design constraint

3

Frequency Trends in Intel's Microprocessors

10000Pentium 4 Core2

!"#$%#&'(

10

100

1000

requ

ency

[MH

z]

808680286

386DX

486DX 486DX4

PentiumPentium Pro

Pentium II

Pentium MMX

Pentium III

ItaniumItanium II

Core2

)

0.1

1

1970 1975 1980 1985 1990 1995 2000 2005

Fr

4004

8008

8080

8088 Has been doublingevery 2 years, but is now slowing down

*+,#"-./00/123/+&

Power Trends in Intel's Microprocessors

1000

Has been > doublingevery 2 years

10

100

Pow

er [W

]

808680286 486DX

Pentium

Pentium Pro

Pentium II

Pentium IIIPentium 4

ItaniumItanium II Core 2

4

Has to stay ~constant

0.1

1

1970 1975 1980 1985 1990 1995 2000 2005

4004

8008 80808088

386DX

(c) Ruzica Jevtic 2012

Page 8: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

Power  issues •  Power became an issue before variations: - battery life for mobile devices

- power limits performance! - performance is what sells the product!

•  Power classification: - Dynamic power (60%)

- Static power (30%) exp(Vdd)

P = sw ! f !Vdd2 !C

(c) Ruzica Jevtic 2012

Page 9: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

Power-­‐‑performance  trade-­‐‑offs •  The best way to reduce power (both leakage and

active) is to reduce the power supply

•  How to maintain throughput under reduced supply?

•  Introducing more parallelism/pipelining •  Dynamic voltage scaling with variable throughput

Energy

Performance (c) Ruzica Jevtic 2012

Page 10: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

Overview

•  Dynamic voltage and frequency scaling

•  Control voltage through DC-DC converters •  Conclusions

•  Error detection and correction circuits

(c) Ruzica Jevtic 2012

Page 11: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

Impact  of  Dynamic  Variations

MSFF D

CLK

Q MSFF

CLK

VCC  Droop

CLK

D

D

Setup  Time Timing  with  VCC  Droop

Timing  with  Nominal  VCC

Timing  Guardband

Ø Guardbands   required   to   ensure   correct   operation   within   the  presence  of  dynamic  variations

K.  Bowman,  CMOS  Emerging  Tech.  Workshop,  2009   (c) Ruzica Jevtic 2012

Page 12: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

Timing-­‐‑Error  Detection Error-­‐‑Detection  Sequential  (EDS)

D

CLK

Q

ERROR

MSFF

LATCH

MSFF

CLK

VCC  Droop

CLK

D

ERROR

Data  Arrives  Late

Error  Detected

[1] P.  Franco,  et  al.,  VLSI  Test  Symp.,  1994. [2] M.  Nicolaidis,  VLSI  Test  Symp.,  1999. [3] D.  Ernst,  et  al.,  MICRO,  2003.

RAZOR

(c) Ruzica Jevtic 2012

Page 13: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

Recovery  mechanism DAS et al.: A SELF-TUNING DVS PROCESSOR USING DELAY-ERROR DETECTION AND CORRECTION 795

Fig. 3. Distributed pipeline recovery mechanism.

corrupting the state of the shadow latch. Delay buffers are re-quired to be inserted in those paths which fail to meet this min-imum path delay constraint imposed by the shadow latch. Theinsertion of delay buffers incurs power overhead because ofthe extra capacitance added. A large shadow latch samplingdelay requires a greater number of delay buffers to be inserted,thereby increasing the power overhead. However, a small sam-pling delay implies that the voltage difference between the pointof first failure and the point where shadow latch fails is less and,thus, reduces the voltage margin available through Razor timingspeculation. Hence, the shadow latch sampling delay representsthe tradeoff between power overhead due to delay buffers andthe voltage margin available for Razor subcritical mode of op-eration. Using suitable clock chopping techniques, the durationof the positive phase of the propagated clock can be configuredas required so as to exploit the above tradeoff.

A key point to note is the fact that the hold constraint im-posed by the shadow latch only limits the maximum durationof the positive clock phase and has no bearing upon the clockfrequency. Thus, a “Razor”-ed pipeline can still be operated atany frequency as required as long as the positive clock phase issufficient to meet the minimum path delay constraint. In our de-sign, for a sampling delay of 3.0 ns which is approximately halfthe cycle time at 140 MHz, it was required to add 2388 delaybuffers to satisfy the short path constraint on 207 RFFs (7.4%of the total number of flip-flops). The power overhead due tothese buffers was less than 3% of the nominal chip power.

Correct pipeline state is recovered in the event of a timingerror by engaging a distributed pipeline recovery mechanism, asdescribed in [1], which is based on a counter-flow pipeline archi-tecture [9]. The primary requirement of the recovery mechanismis to prevent corrupt state being committed to storage in memoryor the register file before being validated by Razor. In [1], wehave discussed two possible ways in which this can be achieved.A centralized pipeline recovery mechanism uses thesignal as a global clock-gating signal to stall the pipeline fora single cycle while the errant flip-flop recovers correct state.This incurs only a one-cycle recovery penalty but imposes sig-nificant timing restrictions on the signal which needsto be distributed through the entire chip in less than one cycle.In contrast, the distributed pipeline recovery mechanism placesnegligible restrictions on the cycle time at the expense of ex-tending recovery over several cycles.

Fig. 3 conceptually illustrates the working principle of thedistributed pipeline recovery mechanism. When a Razor erroroccurs, two actions are taken. First, the computation in the stage

following the errant stage is nullified by a “bubble” signal whichindicates to the next and subsequent stages that the pipeline slotis invalid. Second, a backward propagating flush train is trig-gered by asserting the stage identifier (ID) of the failing stage.In the following cycle, the correct value from the Razor shadowlatch data is injected back into the pipeline, allowing the errantinstruction to continue with its correct inputs. In addition, theflush train begins propagating the ID of the failing stage in theopposite direction of instructions. At each stage, the flush traininserts a bubble in the corresponding pipeline stage as well as inthe immediately preceding stage. (Two stages must be nullifiedbecause the main pipeline appears to move twice as fast rela-tive to the flush train.) When the flush ID reaches the start of thepipeline, the flush control logic restarts the pipeline at the in-struction following the errant instruction. In the event that mul-tiple stages experience errors in the same cycle, all will initiaterecovery but only the Razor error closest to write-back (WB)will complete. Earlier recoveries will be flushed by later ones.

III. TRANSISTOR-LEVEL DESIGN OF THE RFF

Fig. 4 shows the transistor level circuit schematic of the RFF.In the absence of a timing error, the RFF behaves as a standardpositive edge triggered flip-flop. The error comparator is a semi-dynamic XOR gate which evaluates when the data latched by theslave differs from that of the shadow in the negative clock phase.The error comparator shares its dynamic node with themetastability detector which evaluates in the positive phase ofthe clock when the slave output could become metastable. Thus,the RFF signal is flagged when either the metastabilitydetector or the comparator evaluate.

This, in turn, evaluates the dynamic gate to generate thesignal by ORing together the error signals of indi-

vidual RFFs (Fig. 5), in the negative clock phase. Thesignal incurs significant routing and gate capacitance as it isrouted to every flip-flop in the pipeline stage and needs to bedriven by strong drivers. For an RFF, the serves tooverwrite the master with the shadow latch data. Hence, theslave gets the correct data at the next positive edge.

The needs to be latched at the output of the dynamicOR gate so that it retains state during the next positive phase(recovery cycle) during which it disables the shadow latch toprotect state. In addition, the also disables all regular,non-“Razor”-ed flip-flops in the pipeline stage to preserve thestate that was latched in the errant cycle. This is required tomaintain the temporal consistency of all flip-flops in the pipelinestage. The stack of three pMOS transistors in the shadow latch

•  Correct data restored in the following clock cycle •  All previous pipeline stages have to be flushed •  Program counter resumes at the next instruction

(c) Ruzica Jevtic 2012

Page 14: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

Razor  –  issues

Good idea but has a lot of issues: -  Duty cycle constraints the shortest path -  Metastability at the output of the main FF -  The longest paths changed if a latch is added to FF

D Q

D Q

Short  path

Long  path D Q

Shadow  Latch

CLK

ERROR!

(c) Ruzica Jevtic 2012

Page 15: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

Improvements

ü  Lower  clock  power ü  Data  metastability  solved  -­‐‑      Additonal  buffers  needed            for  short  paths

ü  No added buffers for short paths ü  Data metastability detected w/o

additional detector - Detection window tuned in prefab  

D Q

*

*

*

*

EN

ERROR

PG  RISING

PG  FALLING

LATCH

ERROR

ARM’10 Intel’09

(c) Ruzica Jevtic 2012

Page 16: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

Tunable  Replica  Circuit  (TRC)

EDS

Calibration  Bits

error

Logic  stages: Ø  Inverter Ø NAND Ø NOR Ø  Pass  gates Ø Repeated  interconnects

J.  Tschanz,  et  al.,  Symp.  VLSI  Circuits,  2009. (c) Ruzica Jevtic 2012

Page 17: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

TRC  –  cont’d

1132009 Symposium on VLSI Circuits Digest of Technical Papers

•  Not  so  accurate  as  EDS,  but  no  interfering  with  the  longest  path •  If  properly  tuned,  no  recovery  mechanism  needed

(c) Ruzica Jevtic 2012

Page 18: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

Observation  circuits  -­‐‑  summary Advantages:

ü Reduce margins for process variations ü EDS:

•  enable instruction recovery •  detect errors in pipeline stages

ü  TRC: •  capture clock-to-data delay per pipeline stage •  have low design overhead

Disadvantages: -  EDS:

•  Adding buffers for short paths •  Metastability issues •  The longest path is affected by additional circuits

-  TRC: •  Cannot detect local dynamic variations •  Requires margin between TRC & the longest path •  Requires post-silicon calibration (c) Ruzica Jevtic 2012

Page 19: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

Overview

•  Error detection and correction circuits

•  Control voltage through DC-DC converters

•  Conclusions

•  Dynamic voltage and frequency scaling

(c) Ruzica Jevtic 2012

Page 20: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

DVFS

5

Typical MPEG IDCT Histogram

9

DesiredThroughput

Compute-intensive andlow-latency processes

Processor Usage Model

g p

Maximum Processor Speed

10

timeSystem Idle

Background andhigh-latency processes

• Maximize Peak Throughput• Minimize Average Energy/operation

System Optimizations:BurdISSCC’00Burd,  ISSCC’00

(c) Ruzica Jevtic 2012

Page 21: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

DVFS  –  cont’d

7

1.0

LK

CMOS Circuits Track Over VDD

InverterRingOscRegFileSRAM

0.5

orm

aliz

ed m

ax. f

C

13

0VT 2VT 3VT 4VT

No

VDD

Delay tracks within +/- 10%BurdISSCC’00

Vary !CLK,VDD1 2 Dynamically adapt

Dynamic Voltage Scaling (DVS)

Vary !CLK,VDD

Delivered

Throughput

1 2 Dynamically adapt

14

time

• Dynamically scale energy/operation with throughput.• Always minimize speed minimize average energy/operation.• Extend battery life up to 10x with the exact same hardware!

BurdISSCC’00

Burd,  ISSCC’00

(c) Ruzica Jevtic 2012

Page 22: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

Traditional  DVFS

•  Traditional  DVFS  use  a  ring  oscillator  for  frequency  detection  (Xscale,  PowerPC,  Pentium  M,  …)

•  Impossible  to  use  nowadays:  ring  oscillator  frequency  change  does  not  reflect  the  cpu  frequency  change

•  Different  paths  have  different  behavior  with  Vdd  change

Counter

+ -­‐‑ Fcpu_av

DC-­‐‑DC

uP Ring  Oscillator

... fcpu

Burd,  ISSCC’00

(c) Ruzica Jevtic 2012

Page 23: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

Resilient  DVFS

•  Resilient DVFS use TRCs and EDS as freq. indicator •  Two options: TRC+EDS or EDS with recovery

Fcpu_av DC-­‐‑DC

uP

 CLK   Cntrl

EDS  with  recovery Slow  down

J.  Tschanz,  et  al.,  Symp.  VLSI  Circuits,  2009.

Slow  down

Max.  path  delay Speed  up

EDS TRC

(c) Ruzica Jevtic 2012

Page 24: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

RAVEN  microprocessor

•  Implemented in the newest technology: 28nm! (apple i5/i7 cores in 32nm, altera chips in 28nm)

•  Mobile applications: battery life important!

•  Manycore architecture: exploiting parallelism! •  Error detection circuits for observation:

improvement over ARM architecture! •  Unconventional DVFS scheme

(c) Ruzica Jevtic 2012

Page 25: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

Motivation  -­‐‑  DVFS •  Single core

E

Perf

" Two cores

E

Perf

" Performance dictated by the slower core " In  a  conventional  DVFS  synchronous  system  

(c) Ruzica Jevtic 2012

Page 26: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

Goal •  Manycore

E

Perf

" Make sure to operate at the optimal energy-performance point

Instead of operating here

Would like to operate here

10x

(c) Ruzica Jevtic 2012

Page 27: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

Per-Core Supply/Clock Control

PLL

EDS

EDS

EDS

EDS

FIFO

FIFO FIFO

FIFO

DC-­‐‑DC DC-­‐‑DC

DC-­‐‑DC DC-­‐‑DC

(c) Ruzica Jevtic 2012

Page 28: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

DVFS on manycore processor

•  Allow  the  ripple  at  the  output  and  track  the  voltage  through  clock  generation  for  beler  energy-­‐‑efficiency  

Control Clock  

Generator

fdesired

 DC-­‐‑DC

Tracking

TRC

EDS uP

Delay

Energy

(c) Ruzica Jevtic 2012

Page 29: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

Overview

•  Error detection and correction circuits

•  Dynamic voltage and frequency scaling

•  Conclusions

•  Control voltage through DC-DC converters

(c) Ruzica Jevtic 2012

Page 30: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

DC-­‐‑DC  converters

Two phases: 1. Loading the energy

from the battery 2. Transferring loaded

energy to the output

(c) Ruzica Jevtic 2012

Page 31: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

!" !" #$%# $"%&'( )"*+(&,-"% ./0 .-!!1 &()"'02)"$ %3&)*+"$4*252*&)/0 $*4$* */(6"0)"0% 7878

.9:; 7; <=> 2 7#8 ?@AB4CDEF %* $*4$* GDFHAI@AI =FC <J> 9@? DBAI=@9DF=K E=HALDIM?;

&&; %* */(6"0)"0 !/%% 2(2!1%&% 2($ /5)&N&O2)&/(

&F DICAI @D =GP9AHA @PA DB@9M=K @I=CADLL JA@EAAF BDEAI CAF4?9@Q =FC AL!G9AFGQR 9F @P9? ?AG@9DF EA E9KK !I?@ =F=KQSA @PA DBAI4=@9DF =FC KD?? MAGP=F9?M? DL %* GDFHAI@AI?; )P9? =F=KQ?9? E9KKKA=C @D CA?9:F ATU=@9DF? LDI ?E9@GP9F: LIATUAFGQ =FC ?E9@GPE9C@P @P=@ M9F9M9SA KD??A? 9F = :9HAF @AGPFDKD:Q =FC BDEAICAF?9@Q; %9FGA = ?9F:KA4@DBDKD:Q %* GDFHAI@AI 9? DFKQ AL!G9AF@EPAF :AFAI=@9F: =F DU@BU@ HDK@=:A E9@P9F = K9M9@AC I=F:AR @P9??AG@9DF =K?D CA?GI9JA? = ?9MBKA CA?9:F ?@I=@A:Q LDI AF=JK9F: IA4GDF!:UI=JKA @DBDKD:9A? =? EAKK =? BIAC9G@9F: @PA DHAI=KK AL!4G9AFGQ HAI?U? DU@BU@ HDK@=:A;

&% '(!)#"*+, +- # .#/($! .0 0+,1!)"!)

&F DICAI @D AKUG9C=@A @PA VAQ KD?? MAGP=F9?M? @P=@ E9KK ?A@ @PA@I=CADLL JA@EAAF GDFHAI@AI AL!G9AFGQ =FC =IA= <9;A;R BDEAI CAF4?9@Q>R EA E9KK JA:9F JQ AW=M9F9F: @PA DBAI=@9DF DL @PA 7#8 ?@AB4CDEF GDFHAI@AI ?PDEF 9F .9:; 7<=>; %E9@GPAC4G=B=G9@DI $*4$*GDFHAI@AI? @QB9G=KKQ DBAI=@A 9F @ED BP=?A?R A=GP DL EP9GP 9CA=KKQP=? XYZ CU@Q GQGKA; 3P9KA 9@ 9? BD??9JKA @D DBAI=@A %* $*4$*GDFHAI@AI? =@ = !WAC ?E9@GP9F: LIATUAFGQ =FC U?A = H=I9=JKA CU@QGQGKA @D =C[U?@ @PA DU@BU@ 9MBAC=FGA DL @PA GDFHAI@AI \7Y]R \78]R=? E9KK JA ?PDEF K=@AI 9F DUI =F=KQ?9?R M=W9MUM AL!G9AFGQ G=FDFKQ JA =GP9AHAC JQ DB@9M9S9F: ?E9@GP9F: LIATUAFGQ =FC DBAI4=@9F: E9@P XYZ CU@Q GQGKA @D M=W9M9SA @PA GP=I:A @I=F?LAI CU@QGQGKA;39@P @PA XYZ CU@Q GQGKA DBAI=@9DF ?PDEF 9F .9:; 7R CUI9F:

BP=?A R @PA "Q9F: G=B=G9@DI 9? GDFFAG@AC JA@EAAF @PA9FBU@ FDCA 69 =FC @PA DU@BU@ FDCA 6D; )PA GP=I:A CI=EF LIDM69 @PDU:P GP=I:A? @P9? G=B=G9@DI UB =FC "DE? @D @PA KD=C;&F BP=?A R 9? GDFFAG@AC JA@EAAF 6D =FC '($R =FC @PU?@PA GP=I:A BIAH9DU?KQ ?@DIAC DF @PA "Q9F: G=B=G9@DI 9? @I=F?4LAIIAC @D @PA DU@BU@; %9FGA @PA ?E9@GP9F: GQGKA 9? @QB9G=KKQ MUGP?M=KKAI @P=F @PA GP=I:A^C9?GP=I:A @9MA GDF?@=F@ <EP9GP 9? ?A@JQ >R @PA I=MB I=@A DL @PA HDK@=:A =GID?? @PA G=B=G9@DI 9?IAK=@9HAKQ GDF?@=F@R =FC PAFGA @PA KD=C G=F JA =BBIDW9M=@AC =?= GUIIAF@ ?DUIGA; 2? E9KK JA CA@=9KAC K=@AIR 9F DICAI @D M=W9M9SAAL!G9AFGQ 9@ 9? CA?9I=JKA @D U@9K9SA =KK =H=9K=JKA G=B=G9@=FGAE9@P9F@PA GDFHAI@AI 9@?AKL; )PAIALDIAR EA E9KK =??UMA @P=@ @PAIA 9? FDAWBK9G9@ DU@BU@ !K@AI9F: G=B=G9@DIR EP9GP 9F @PA G=?A DL @PA ?9MBKA%* GDFHAI@AI CA?GI9JAC ?D L=IR M=VA? @PA BA=V4@D4BA=V HDK@=:AI9BBKA =GID?? @PA G=B=G9@DI =FC @PA GDFHAI@AI_? DU@BU@ ATU=KR =??PDEF 9F .9:; 7<J>; )P9? HDK@=:A I9BBKA P=? = C9IAG@ 9MBK9G=@9DF DF@PA KD??`=FC PAFGA @PA =GP9AH=JKA AL!G9AFGQ`DL @PA GDFHAI@AI;

2% 3+44 &,#$54*4

)PA HDK@=:A I9BBKA =GID?? @PA G=B=G9@DI? ?G=KA? E9@P @PA KD=CGUIIAF@R =FC =? E9KK JA CA?GI9JAC ?PDI@KQR E9KK @PAIALDIA =BBA=I

=? = LDIM DL ?AI9A? KD?? ?9M9K=I @D @PA ?E9@GP GDFCUG@9DF KD??A?;&F =CC9@9DFR =FQ %* GDFHAI@AI E9KK =K?D P=HA ?PUF@ KD??A? @P=@=IA 9FCABAFCAF@ DL @PA KD=C GUIIAF@R 9FGKUC9F: :=@A =FC JD@@DMBK=@A G=B=G9@DI ?E9@GP9F: KD??A?; (D@A @P=@ @PA GDF@IDK G9IGU9@IQLDI =F %* GDFHAI@AI E9KK =K?D GDF@I9JU@A @D ?PUF@ KD??R JU@ E9KKJA FA:KAG@AC PAIA ?9FGA @P9? KD?? 9? @QB9G=KKQ ?M=KK 9F GDMB=I49?DF @D @PA D@PAI KD??A? =@ @PA BDEAI KAHAK? @P=@ =IA @PA LDGU?DL @P9? B=BAI; )PA?A KD??A? G=F JA MDCAKAC =? ?PDEF 9F .9:; aREPAIA @PA ?AI9A? KD??A? =IA IABIA?AF@AC JQ =F ATU9H=KAF@ DU@BU@IA?9?@=FGA \8b]R \8c]R @PA ?PUF@ KD??A? JQ @PA B=I=KKAK IA?9?@DIR =FC @PA @I=F?LDIMAI IABIA?AF@? @PA 9CA=K HDK@=:A GDFHAI?9DF

I=@9D;&F DICAI @D AKUG9C=@A @PA IAK=@9DF?P9B JA@EAAF HDK@=:A I9BBKA

=GID?? @PA G=B=G9@DI =FC KD??R 9@ 9? 9MBDI@=F@ @D IAG=KK @P=@ MD?@LUKKQ 9F@A:I=@AC ?E9@GPAC4G=B=G9@DI GDFHAI@AI? E9KK JA CAK9HAI9F:BDEAI @D ?QFGPIDFDU? C9:9@=K G9IGU9@IQ; )PA BAILDIM=FGA DL G9I4GU9@? 9F ?QFGPIDFDU? C9:9@=K ?Q?@AM? 9? CA@AIM9FAC JQ @PA DBAI4=@9F: LIATUAFGQR EP9GP 9F @UIF 9? ?A@ JQ @PA M9F9MUM =HAI=:AHDK@=:A DHAI = GKDGV BAI9DC; %9FGA @PA GKDGV BAI9DC DL MD?@ C9:49@=K G9IGU9@? E9KK JA ?PDI@ 9F GDMB=I9?DF @D @PA %* GDFHAI@AI_??E9@GP9F: BAI9DCR @PA BAILDIM=FGA DL @PA?A G9IGU9@? 9? @QB9G=KKQ?9MBKQ ?A@ JQ @PA M9F9MUM HDK@=:A DL @PA ?UBBKQ I=9K \87];&F @P9? G=?AR @PA AL!G9AFGQ DL @PA GDFHAI@AI ?PDUKC JA G=KGUK=@ACIAK=@9HA @D @PA BDEAI @P=@ EDUKC P=HA JAAF GDF?UMAC JQ @PAKD=C 9L 9@ E=? GDF?@=F@KQ DBAI=@9F: =@ AW=G@KQ \87]; &F D@PAIEDIC?R @PA 9CA=K BDEAI GDF?UMAC JQ @PA KD=C 9?#

<8>

EPAIA ; +DEAHAIR CUA @D @PA HDK@=:A I9BBKA LIDM@PA GDFHAI@AIR =??UM9F: @P=@ @P9? I9BBKA 9? IAK=@9HAKQ ?M=KK GDM4B=IAC @D @PA FDM9F=K HDK@=:AR @PA =HAI=:A BDEAI C9??9B=@AC JQ@PA KD=C DHAI DFA ?E9@GP9F: GQGKA DL @PA GDFHAI@AI 9? =BBIDW94M=@AKQ#

<7>

EPAIA 9? @PA DU@BU@ HDK@=:A I9BBKA <CUA @D @PA DBAI=@9DF DL@PA GDFHAI@AI> =FC ;2K@PDU:P 9? 9FCAAC C9??9B=@AC JQ @PA KD=CR =FQ BDEAI

GDF?UMAC JAQDFC ?PDUKC JA GDUF@AC =? KD?? ?9FGA @P9?=CC9@9DF=K BDEAI CDA? FD@ GDF@I9JU@A @D =F 9FGIA=?A 9F BAILDI4M=FGA; &F DICAI @D TU=F@9LQ @P9? KD??R EA FAAC @D G=KGUK=@A=FC d =? ?PDEF 9F .9:; 7<J>R LDI @PA 7#8 GDFHAI@AI GDF?9C4AIAC PAIAR 9? KDEAI @P=F @PA 9CA=K DU@BU@ HDK@=:A JQ

#

<b>

Switched  Capacitor  DC-­‐‑DC  Converter

•  Advantages: - Fully integrated on a chip - Smaller area (no inductive components) - Large power density •  Disadvantages:

- Discrete output voltage - Low efficiency

ñ  

Vin

Cfly

Φ1

Φ1 Φ2

Φ2

Φ2

Vout

Vout

Φ1 Vout Vin

(c) Ruzica Jevtic 2012

Page 32: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

0.43V

0.55V

0.78V

0.96V

Vin  [V]   ra)o   Vout  [V]  

1   1/2   0.5  

1   2/3   0.67  

1   1/3   Too  small  

H-­‐‑P.  Le,  ISSCC  2011

Discrete  points  solution

1.8 1/2 0.9  

Range[V] 0.45  –  0.55  

0.55  –  0.71    

Too  small    

0.79  –  0.96    

(c) Ruzica Jevtic 2012

Page 33: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

Efficiency  solution

Efficiency  can  be  improved  by  more  than  10%!

 Loss conventional our  approach

(c) Ruzica Jevtic 2012

Page 34: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

DVFS  details

Vin Vout

Energy

Delay

Vref

Vout=1/2  VinVout=2/3  Vin                      ...

Configuration

Fcpu_av

Clock  gen.Clock  gen.

DC-­‐DC

TRC

...EDSEDS

uP

Tracking

Vo

clk

•  Set  the  ripple  size  at  the  optimal  energy-­‐‑delay  point

(c) Ruzica Jevtic 2012

Page 35: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

Overview

•  Error detection and correction circuits

•  Dynamic voltage and frequency scaling •  Control voltage through DC-DC converters

•  Conclusions

(c) Ruzica Jevtic 2012

Page 36: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

Summary •  Process variations introduce difficulties in circuit

design •  Power is a critical design constraint

•  Observation circuits needed in order to avoid too conservative design decisions (EDS and TRC)

•  Multicore/manycore architectures open spatial

dimension for energy optimization through DVFS

•  Fine granularity V-F control enabled through performance observation and DC-DC converters

(c) Ruzica Jevtic 2012

Page 37: Designingenergyefficient’ microprocessor:Howtofight ... Memory ... [MHz] 8086 80286 386DX 486DX 486DX4 Pentium Pentium Pro Pentium II Pentium MMX Pentium III ... Delay buffers are

Acknowledgement

Marie Curie FP7 People program

(c) Ruzica Jevtic 2012