Top Banner
University of Maryland Samuel Rodriguez Ph.D. Proposal Department of Electrical and Computer Engineering Comparative Analysis of Contemporary Cache Power Reduction Techniques Ph.D. Dissertation Proposal Samuel V. Rodriguez
44

Comparative Analysis of Contemporary Cache Power …

Nov 29, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Comparative Analysis ofContemporary Cache Power

Reduction Techniques

Ph.D. DissertationProposal

Samuel V. Rodriguez

Page 2: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Motivation

Portable Devices

Power dissipation is important acrossthe board, not just portable devices!!

Mid-end (e.g. Desktops)

High-end

(e.g. servers)

Page 3: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Motivation

-Thermal Design Power (TDP) is now apriority specification

-AMD currently can’t compete in “Thinand light” notebooks because of theirhigher TDP’s

-AMD’s power advantage in initial dual-core offerings

-An entire Intel Pentium 4 designrecently cancelled because of higherthan expected TDP’s

Page 4: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Motivation

• Breakdown of power consumption for a 4-wide 200MHz 3.3V 0.35um processor with32kB/32kB/1MB caches

Photograph taken from Gurumurthi

Cache(44%)

CoreDatapath(22%)

Clock(32%)

Page 5: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Motivation

• Fraction of die area and xsistorcount dedicated to caches isincreasing

Photograph taken from Weiss2002

Itanium

Die

Photo

Page 6: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Presentation Outline

• Motivation (finished)• Background

– Power Dissipation– Cache/SRAM Implementation

• Contemporary Cache PowerReduction Schemes

• Proposed Work• Q&A

Page 7: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Background (Power Dissipation)

Physical Gate Length (um)

Normalized Total Chip Power

Year1990 1995 2000 2005 2010 2015 2020

0

50

100

150

200

250

300

0.000001

0.0001

0.01

1

100

Dynamic PowerSubthresholdLeakage Power

Phy.Gate Length

- Need to account for both dynamicand static power dissipation!

Graph from Kim2004

Page 8: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Background (Power Dissipation)

• Causes of dynamic power

Page 9: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Background (Power Dissipation)

• Causes of dynamic power

Page 10: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Background (Power Dissipation)

• Causes of dynamic power

Page 11: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Background (Power Dissipation)

• Powerdyn ∝ N x C x VDD2 x f

– ↑↑↑ N : Number of transistors– ↓↓ C : Device capacitance– ↓ VDD : supply voltage– ↑↑ f: Frequency

• Dynamic power trend: slow increase

Page 12: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Background (Power Dissipation)

• Causes of static power: leakagecurrents

SOURCE

GATE

DRAIN

BODY

Page 13: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Background (Power Dissipation)

• Subthreshold: 5x per generation• Gate leakage: 500x per generation!!!

SOURCE

GATE

DRAIN

BODY

SOURCE

GATE

DRAIN

BODY

Page 14: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Background (Power Dissipation)

• Subthreshold leakage is increasing:

Id,sat ∝ (Vgs – Vth) = (VDD – Vth)

• Increase: 5x per generation

Page 15: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Background (Power Dissipation)

• Gate leakage

• Gate leakage: 500x per generation!!!

SOURCE

GATE

DRAIN

BODY

Tox scaling resultingin increased gateleakage caused byoxide tunneling

Page 16: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Presentation Outline

• Motivation (finished)• Background

– Power Dissipation– Cache/SRAM Implementation

• Contemporary Cache PowerReduction Schemes

• Proposed Work• Q&A

Page 17: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

SA SA

TAG D ATA

SA SA

TAG D ATA

HIT

TLB

TLB _O UTOU TPU T

DATA

VIRTUA L ADD RESS

CLK

A D D R

AD S

RD /W RB

TAG W L

TA G BL/BLB

TA G O U T

TLB O U T

H IT

D ATA W L

D ATA BL/BLB

D ATA O U T

O U TPU T D ATA

Background (Cache Implementation)

2-way set-associativeCache Read

Note: (external signal)

Page 18: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Background (Cache Implementation)

• Full CMOS 6T Memory Cell

WL

BL BLB

BL BLB

WL

WL

BL/BLB

READ OP WRITE OP

PRECHARGE

Page 19: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Background (Cache Implementation)

BL BLB

WL

Simplified 8 x 8b SRAM array

Page 20: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Background (Cache Implementation)

Simplified 8 x 8b SRAM array

CLK

ADDR

WL

BL/BLB

DATA OUT

Page 21: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Background (Cache Implementation)

SRAM partitioning: array is often divided into smaller “subarrays”

Addr Tag Addr Index

Subarray 3

Subarray 2

Subarray 1

Subarray 0

SubarrayID partOf ADDRIndex

Page 22: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Presentation Outline

• Motivation (finished)• Background (finished)

– Power Dissipation– Cache/SRAM Implementation

• Contemporary Cache PowerReduction Schemes

• Proposed Work• Q&A

Page 23: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Cache Power ReductionTechniques

NO

NO

YES

YES

NO

YES

YES

Exec-timeincrease?

N/AN/ADynamicData sizedetection

N/A55%DynamicWay-halting

N/AN/A **StaticNear-OPTprecharge

YES60-75%StaticDrowsycache

YES39%-59%StaticDRG-cache

NO80%StaticCachedecay

NON/A *StaticGated-Vdd

State-Retentive?

Est. PowerSavings

Dynamic /Static?

Scheme

* - paper only cites 62% energy-delay savings** - paper only cites 92% reduction of bitline discharge

Page 24: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Cache Power ReductionTechniques (cont…)

YES

YES

NO

NO

YES

NO

NO

µARCHtransparent?

NO

NO

NO

YES

YES

NO

NO

Additionalnoiseproblems?

NOYESNOData sizedetection

NONO*NOWay-halting

YESNO*NONear-OPTprecharge

YESYESNODrowsycache

NOYESNODRG-cache

NOYESYESCachedecay

NOYESYESGated-Vdd

Variableload-hitlatency?

Accesstimeincrease?

MissRatioincrease?

Scheme

* - With proper design

Page 25: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Cache Power Reduction Techniques

First four techniques:Supply Gating

INPUTS

ACTIVE

OUTPUTS

INPUTS OUTPUTS

ACTIVE

ON

OFF

OFF

ON

OFF

Ileak1 Ileak2

Ileak2>>Ileak1

StackingEffect:

Page 26: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Cache Power Reduction Techniques

1. Gated-Vdd (circuit)

SLEEP

WL

BL BLB

High-Vt PMOS

-6TMC can be disabled by the gating transistor, resulting in less leakage

Page 27: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Cache Power Reduction Techniques

1. Gated-Vdd (microarchitecture : Dynamically ResIzable [DRI] Cache)

SA SA

TAG DATA

HIT

TLB

TLB_OUT

VIRTUAL ADDRESS

DATA OUT

MASK

MASKCONTROLCIRCUIT

- Mask out partof the index todynamically resizethe cache- Make this decisionbased on the cacheHit ratio- Energy-delay reduced by 62%

Page 28: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Cache Power Reduction Techniques

1. Gated-Vdd (microarchitecture : Dynamically ResIzable [DRI] Cache)

SA SA

TAG DATA

HIT

TLB

TLB_OUT

VIRTUAL ADDRESS

DATA OUT

MASK

MASKCONTROLCIRCUIT

Example:If MASK removesthe upper 2 bitsof the index, onlythe lower _ setsof the cache canbe accessed (allother sets aregated off)

Page 29: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Cache Power Reduction Techniques

2. Cache decay (concept)

Cache block first accessed

Cache block last accessed

Cache block evicted

time

-If we turn a cache block’s power offright after it is last accessed, we saveleakage power without any performancepenalty

Page 30: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Cache Power Reduction Techniques

2. Cache decay (circuit borrows

Gated-Vdd techniques)

SLEEP

WL

BL BLB

High-Vt PMOS

Page 31: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Cache Power Reduction Techniques

2. Cache decay (microarchitecture)

zero?

zero?

zero?

CLK

DEC

DEC

DEC

WORDLINE DECODER

CACHE LINES

- Static power reduced by 80%!!

Page 32: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Cache Power Reduction Techniques

3. Data Retention Ground (DRG) (circuit)

-DRG gates the ground of the MC’s-With careful sizing, state can be preserved!-Technique is transparent!!-Power is reduced by 39% to 59%

ROW DECODER

WL (Gated-ground control)

3/1.5 3/1.5

4/1

3/13/1

4/1

(6.5/1) x no. of columns

BL BLB

Page 33: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Cache Power Reduction Techniques

4. Drowsy caches (circuit)

VDD(1V) VDD(0.3V)

LowVolt LowVolt

VV DD

VDD

ACTIVE

- Drowsy caches (microarchitecture) : Simple algorithm – periodically put *every* cache line into drowsy mode- Static power reduced by 60% to 75%

Page 34: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Cache Power Reduction Techniques

5. Near-optimal Precharging

PRE#

-bitline leakage burns powereven in unused cache subarray(additional power is neededduring the precharge phase)

-For a given time interval, only asmall fraction of subarraysare actually used

-Bitline discharge reduced by 92%

Page 35: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Cache Power Reduction Techniques

5. Near-optimal Precharging

-Near-optimal precharging:stop precharging infrequently-used subarrays-Microarchitecture: countersto track subarray use, and asystem to handle variableload-hit latency

SA SA S A

PRECHARGE

GATED-PRECHARGING PRECHARGE CONTROLLER

Page 36: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Cache Power Reduction Techniques

6. Way-halting cache

-Perform early missdetection to stop access tocache ways that are certainto miss-Early miss detectionperformed by offloading afew tag bits into a fasterarray that performs tagcomparison early in theaccess-Power reduced by 55%

SA SA

TAG D ATA

SA SA

TAG DATA

HIT

TLB

TLB_O UTO UTPUT

DATA

VIRTUAL AD DRESS

EARLY M ISS DETECTO R

Page 37: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Cache Power Reduction Techniques

7. Data Size Detection

-Not every operand uses up the maximum spaceprovided by the wordlength(e.g. ~94% of the operands in 64-bit AlphaSpecInt95 benchmarks use 32-bit or less)

-Keep track of this information to turn off theupper bits of the datapath (saving on wordline,bitline and sense-amp power)

32bits 64-bits

Plot from Brooks and Martonosi

Page 38: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Cache Power ReductionTechniques

NO

NO

YES

YES

NO

YES

YES

Exec-timeincrease?

N/AN/ADynamicData sizedetection

N/A55%DynamicWay-halting

N/AN/A **StaticNear-OPTprecharge

YES60-75%StaticDrowsycache

YES39%-59%StaticDRG-cache

NO80%StaticCachedecay

NON/A *StaticGated-Vdd

State-Retentive?

Est. PowerSavings

Dynamic /Static?

Scheme

* - paper only cites 62% energy-delay savings** - paper only cites 92% reduction of bitline discharge

Page 39: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Cache Power ReductionTechniques (cont…)

YES

YES

NO

NO

YES

NO

NO

µARCHtransparent?

NO

NO

NO

YES

YES

NO

NO

Additionalnoiseproblems?

NOYESNOData sizedetection

NONO*NOWay-halting

YESNO*NONear-OPTprecharge

YESYESNODrowsycache

NOYESNODRG-cache

NOYESYESCachedecay

NOYESYESGated-Vdd

Variableload-hitlatency?

Accesstimeincrease?

MissRatioincrease?

Scheme

* - With proper design

Page 40: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Presentation Outline

• Motivation (finished)• Background (finished)

– Power Dissipation– Cache/SRAM Implementation

• Contemporary Cache PowerReduction Schemes

• Proposed Work• Q&A

Page 41: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Proposed Work

• Detailed comparative study ofdiscussed low-power cachetechniques (and variouscombinations)

• Metrics of comparison:– Power dissipation (including

overheads)– Performance penalty (IPC and access

time)– Die area overhead– Complexity

Page 42: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering

Proposed Work• Contributions

– Every scheme is put on the sameplaying field

– Schemes are made up to date withthe use of predictive 65nm/45nmtechnology

– Improved evaluation accuracy• Gate leakage is now accounted for• Careful accounting for overheads• Use of a state-of-the-art memory

system model

– Data Size Detection is proposed

Page 43: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering Q & A

Page 44: Comparative Analysis of Contemporary Cache Power …

University ofMaryland

Samuel RodriguezPh.D. Proposal

Department ofElectrical and

ComputerEngineering Thank You