Top Banner
EECS 366: Computer Architecure Instructor: Shantanu Dutt Department of EECS University of Illinois at Chicago Lecture Notes # 16 Memory Organization c Shantanu Dutt c Shantanu Dutt, UIC 1
29

EECS Uni - University of Illinois at Chicagodutt/courses/ece366/lect16-mem-hier.pdfEECS 366: Computer Ar chitecur e Instructor: Shantanu Dutt Department of EECS Uni v ersity of Illinois

Aug 05, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: EECS Uni - University of Illinois at Chicagodutt/courses/ece366/lect16-mem-hier.pdfEECS 366: Computer Ar chitecur e Instructor: Shantanu Dutt Department of EECS Uni v ersity of Illinois

EE

CS

366:C

omputer

Architecure

Instructor:ShantanuD

utt

Departm

entofEE

CS

University

ofIllinoisatC

hicago

LectureNotes#

16

Mem

oryO

rganization

c�

ShantanuD

utt

c�

ShantanuD

utt,UIC

1

1

Page 2: EECS Uni - University of Illinois at Chicagodutt/courses/ece366/lect16-mem-hier.pdfEECS 366: Computer Ar chitecur e Instructor: Shantanu Dutt Department of EECS Uni v ersity of Illinois

Mem

oryH

ierarchyD

esign�

Many

programsneedlarge

amountsof

mem

ory,as

thesize

ofthe

prob-lem

sthey

solveincrease.To

solvethe

problemquickly,

fastaccessisneededto

allthisdata

One

solutionis,ofcourse,to

buildvery

largefastm

emoryunits

capableofstoring

1000sofMB

ytes.As

we

saw,fastm

emory(staticm

emory,for

example)consum

estoom

uchVLS

Iareaandpower,so

thatlargem

emory

ofthiskind

isim

practicaltorealize

Furtherm

ore,evenif

itbecom

esfeasibletobuild

largeam

ountsoffast

mem

ory,itisw

ellknown

thataccesstothis

mem

orygetsslowerasit

getslarger

Fortunately,there

isa

way

out!B

ecauseoflocality

propertyof

most

programs,it

isnotnecessaryto

havelarge

amountsof

fastmem

oryforquick

accesstolarge

amountsofdata:

(1)Tem

poralL

ocality:A

nitem

justreferencedwill

bereferencedagain

soon.(2)SpatialL

ocality:W

henanitem

isreferenced,nearbyitem

sinm

emory

will

alsobe

referencedsoon.

c�

ShantanuD

utt,UIC

2

2

Page 3: EECS Uni - University of Illinois at Chicagodutt/courses/ece366/lect16-mem-hier.pdfEECS 366: Computer Ar chitecur e Instructor: Shantanu Dutt Department of EECS Uni v ersity of Illinois

Mem

oryH

ierarchyD

esign(contd.)�

Whattheselocality

propertiesmean

isthatprogram

susea

physicallycontiguousblock

ofdata

forsom

eperiod

oftim

ebefore

moving

onto

anotherblockofdata.

Thusw

ecan

buildvery

fastmem

orythatisjustlarge

enoughtostorethis

smallblock

ofdatathattheprogram

iscurrentlyw

orkingon—

thisisthe

1stlevelofthem

emoryhierarchy,and

isthe

registerfilein

theC

PU

.

The

nextblock

ofdatathatthe

programw

illm

oveto

hastobe

retrievedfrom

thenextlevelofthe

mem

oryhierarchyw

hichhasthe

2ndfastestand

2ndsm

allestmem

oryunit—this

thecache

Note

thatjustlike

thereislocality

forindividualdataitems(w

ords),thereis

alsolocality

betweensm

allblocksand

betweengroupsof

thesesmall

blocks(largerblocks),andso

on.

Thusm

orelevels

arerequiredthathold

largerandlargerblocksuntilthe

lastlevelholdstheentire

data:The

3rdlevelis

main

mem

oryandthe

4thlevelis

secondary/diskstorage.

Block

sizegets

largerasone

goesdown

thehierarchy

mainly

becauseaccesstim

eto

thelow

erlevelincreases,andthus

we

needtospreadthis

accesstime

overmore

words.

c�

ShantanuD

utt,UIC

3

3

Page 4: EECS Uni - University of Illinois at Chicagodutt/courses/ece366/lect16-mem-hier.pdfEECS 366: Computer Ar chitecur e Instructor: Shantanu Dutt Department of EECS Uni v ersity of Illinois

Mem

oryH

ierarchyD

esign(contd.)�

Inprinciple,therecan

be�

levelsin

them

emoryhierarchy

asshown

be-low

.

Faster,

more

expensive

Slow

er,lessexpensive

Th

e Mem

ory H

ierarchy

c�

ShantanuD

utt,UIC

4

4

Page 5: EECS Uni - University of Illinois at Chicagodutt/courses/ece366/lect16-mem-hier.pdfEECS 366: Computer Ar chitecur e Instructor: Shantanu Dutt Department of EECS Uni v ersity of Illinois

Mem

oryH

ierarchyD

esign(contd.)�

An

upperlevelisgenerallya

subsetofthedatacontainedin

thenextlow

erlevel,and

alsobelongto

theentire

mem

oryaddressspace

An

exceptionisthe

registerlevel,allofwhosedatam

aynotbe

containedin

thecacheatalltim

es.Also,the

registerfileis

notpartofthem

emory

addressspace—registersareaddressedby

adifferentaddressthatpertains

tothe

registerfileonly,and

datatransferbetweenthe

registerfileand

thelow

erlevelsare

handledexplicitlyby

theprogram

inusing

LOA

Ds

andS

TOR

Es

The

restofthe

levelssharea

comm

onmem

oryaddressspace,anddata

transfersbetweenthem

are“automatic”andtransparentto

theprogram

—they

arehandledeitherby

hardware(cache–m

ainmem

.hierarchy)orthe

operatingsystem(m

ainm

em.–secondarystoragehierarchy)

c�

ShantanuD

utt,UIC

5

5

Page 6: EECS Uni - University of Illinois at Chicagodutt/courses/ece366/lect16-mem-hier.pdfEECS 366: Computer Ar chitecur e Instructor: Shantanu Dutt Department of EECS Uni v ersity of Illinois

Mem

oryH

ierarchyD

esign(contd.)

GeneralD

efinitionsandP

rinciplesofMem

oryH

ierarchy

Considerany

2adjacentlevels� and� ��

�� inthe

mem

oryhierarchy:

Block:

Minim

umam

ountofdata

(in#

ofw

ords)thatcanbe

transferredbetw

eenthe2

levels

Processor

Blocks of level 2

Blocks of level 1

Level 1

Level 2

Level 3

Hit

rate:F

ractionof

mem

oryaccessestothe

upperlevel(ofthe

2-levelsub-hierarchy)

thatarefound

inthatlevel;denotedby � �

� �

Miss

rate:F

ractionof

accessesthatarenotfound

inthe

upperlevel �

���� �� �;denotedby��� �� �

Hit

time:

Tim

etaken

toaccessa

blockin

theupperlevel;

denotedby

� � �

c�

ShantanuD

utt,UIC

6

6

Page 7: EECS Uni - University of Illinois at Chicagodutt/courses/ece366/lect16-mem-hier.pdfEECS 366: Computer Ar chitecur e Instructor: Shantanu Dutt Department of EECS Uni v ersity of Illinois

GeneralD

efinitionsandP

rinciplesofMem

oryH

ierarchy(contd.)

Considerany

2adjacentlevels

inthe

mem

oryhierarchy:

Miss

penalty:T

ime

toreplacea

blockin

theupperlevelby

aneededblock

thatisnotin

thatlevel.Sincetherecanbe

hitsorm

issesatlowerlevelsfor

obtainingthe

requiredblock.T

hem

isspenalty�������

forthe

upper-most

level(level1)isbe

givenby:

������������������ �� �� ������� � � �

!�"� #���� �� �� � �����

��$����� �

!�%& #�&"� #�

��� �� �� � & ' ����� (

where ��� �

� �isthe

missrate

inlevel� ,and� & ' ����

isthe

blockreplace-

menttim

efrom

level) ��

to) .

The

averagemem

ortaccesstime�*+

forthe

CP

Uis

givenby

�*+�� � ����� �� �,��������� � $-*-./

���� �� �-*-./,� -*-./

����

The

blockreplacem

enttime� & ' ����

=accesstim

e� & ' �*--(tim

eto

accessthethe

1stword

oftheblock

inthe

lowerlevel) �

)+

transfertime�10 �

�� � & ' �23*4�

(time

toaccessthe

remainingw

ord),w

here0

isthe

blocksize

inthe

upperlevel)

and� & ' �23*4�is

thetransferrate

(perword)from

level) �� .

7

Page 8: EECS Uni - University of Illinois at Chicagodutt/courses/ece366/lect16-mem-hier.pdfEECS 366: Computer Ar chitecur e Instructor: Shantanu Dutt Department of EECS Uni v ersity of Illinois

�Fore.g.,thereis

aninitialtim

e� ���3-.

requiredtosearchfor

theblock/page

locationinm

ainm

emory(M

M),andfurtherdueto

refreshingwe

sawthat

averagetime� ��*+

toaccessM

Mis

givenby:� ��*+

�� 2565 �� 23�� ���/� � ��7� .

Then

theinititalaccesstim

eto

MM

is:

� ��*--�� ���3-. �� ��*+

How

ever,the

entirerow

isstoredin

therow

registerafterspending� ��*+

time

toaccessthe

word,and

therequiredblock

ispartofthis

row.

Thus

therestof

thew

ordsin

theblock

canbe

sentinapprox.��7�

time

perw

ord.Thus� ��23*4�

���7� .

Exam

ple:T

hereare

3-levelsin

them

emory

hierarchy:cache,M

M,

secondarystorage.T

hefollo

wing

arevalues

ofabove

parameters:

� � ��8

cc’s, � �� ��9 :;

,cacheblocksize

=4

words,� ��*--

�:

cc’s,� ��23*4��8

cc’s,��� �� ����9 !<,� ��*--�=9( 999

cc’s,� ��23*4��89

cc’s,MM

pagesize=

2K

� 8 ��� words.

Then,the

averagetime

takenby

theC

PU

toaccessa

word

is:

�*+�� � ����� �� �> � �/�������� �� �/�,� �����?

8� 9 9;>� :� 8 @A� � �9 !<� =999� 89 =B @89�?�8� 9 B;� 9 9; @9 C � �8 B: DDE�

c�

ShantanuD

utt,UIC

8

8

Page 9: EECS Uni - University of Illinois at Chicagodutt/courses/ece366/lect16-mem-hier.pdfEECS 366: Computer Ar chitecur e Instructor: Shantanu Dutt Department of EECS Uni v ersity of Illinois

GeneralD

efinitionsandP

rinciplesofMem

oryH

ierarchy(contd.)

Considerany

2adjacentlevels

inthe

mem

oryhierarchy:

Addressing:

Block fram

e addressB

lock offset addr.

or Block # or P

age #or W

ord #Word #

314 3

0

Block size is 16 w

ords

10 9

Block offset

within a page

Cache−

main m

em.

hierarchy(virtual addr.)

Block # (28 bits)

3110 9

0

Word #

Page size is 1K

words

Main m

em−

Sec. storage

hierarchy(virtual addr.)

10 90

Translation

Physical addr.

Page # (14 bits)

Word #

04 3

Block #

Word #

(20 bits)

Page # (22 bits)

Corresponding

physical addr.of the cache

Generic

2323

c�

ShantanuD

utt,UIC

9

9

Page 10: EECS Uni - University of Illinois at Chicagodutt/courses/ece366/lect16-mem-hier.pdfEECS 366: Computer Ar chitecur e Instructor: Shantanu Dutt Department of EECS Uni v ersity of Illinois

GeneralD

efinitionsandP

rinciplesofMem

oryH

ierarchy(contd.)

EffectofB

lockS

ize:

Largertheblock

size,bettertheanticipationofnearbyitem

stobe

refer-encedsoon(spatiallocality)

How

ever,beyond

acertain

blocksize,the

conceptofspatiallocality

isstretched.N

otethatw

hilea

programm

ayaccessalm

ostallitem

sin

asm

allorm

edium-sizeblock,it

lateraccessesarandom

nextblock,not

necessarilyonefollo

wing

thecurrentone—

spatiallocalityis

punctuatedby

randomaccesses(for

ex.,dueto

branches)

Thusfor

largeblock

sizes,therewill

bem

anyuselessdataitem

sinit

thatthe

programm

ightnotaccessinthe

near-future.Since

thespaceon

theupperlevelis

limited,

largertheblock

size,smalleris

the#

ofblocks.

Hencethe

missrate

increaseswhen

thenextrandom

blockis

accessedbythe

program

c�

ShantanuD

utt,UIC

10

10

Page 11: EECS Uni - University of Illinois at Chicagodutt/courses/ece366/lect16-mem-hier.pdfEECS 366: Computer Ar chitecur e Instructor: Shantanu Dutt Department of EECS Uni v ersity of Illinois

EffectofB

lockS

ize(contd.)

CA

CA

Initial A access, m

iss,

Work on A

Work on C

A loaded

Next access is C

, miss,

A

Em

pty

C loaded

Next access is A

, hit

Work on A

Next access is C

, hit

0 misses per iteration

(c) Miss pattern w

ithblock size =

16 words

(b) Miss pattern w

ithblock size =

32 words

A &

B

Next access is C

, miss,

C &

D loaded

Initial A access, m

iss,A

&B

loaded

C &

D

Next access is A

, miss,

A &

B loaded

Work on A

Work on C

2 misses per iteration

ABCD

0.950.05

1

0.9

0.1

16 words

16 words

16 words

16 words

(a) Program

Structure

c�

ShantanuD

utt,UIC

11

11

Page 12: EECS Uni - University of Illinois at Chicagodutt/courses/ece366/lect16-mem-hier.pdfEECS 366: Computer Ar chitecur e Instructor: Shantanu Dutt Department of EECS Uni v ersity of Illinois

EffectofB

lockS

ize(contd.)

�*+�� � �� ��� �� � @� ���F G� HI�

where�*+

isthe

averagemem

oryaccesstime.

Ave

rag

ea

ccess

time

Miss

pe

na

lty

Blo

ck sizeB

lock size

Miss

rate

Po

llutio

n p

oin

t

Blo

ck size

t_a

v

t_a

v = h

it_tim

e +

(miss_

rate

) (miss_

pe

na

lty)

Incre

ase

ha

pp

en

se

arlie

r tha

n in

"miss ra

te" p

lot

Acce

ss time

c�

ShantanuD

utt,UIC

12

12

Page 13: EECS Uni - University of Illinois at Chicagodutt/courses/ece366/lect16-mem-hier.pdfEECS 366: Computer Ar chitecur e Instructor: Shantanu Dutt Department of EECS Uni v ersity of Illinois

GeneralD

efinitionsandP

rinciplesofMem

oryH

ierarchy(contd.)

�W

hattheC

PU

doesona

missin

theupperlevel:

(1)If

them

isspenaltyis

afew

10sof

clockcycles

(cc’s),thenthe

CP

Uw

aits(ex.,cachem

iss)(2)If

them

isspenalty,is100sto

1000sofcc’s(asin

main-m

emorym

issor

pagefault),CP

Uis

interruptedona

miss,and

anotherprocessstartsexecuting.W

henthe

requestedblockis

broughtin,this

isnoted

inthe

previousprocess’sstatus,so

thatitcan

startre-executingata

laterstage(w

henthe

currentprocessisdoneorit

alsohasa

miss)

Block

transfermechanism

:(1)D

onein

hardwarefor

few10sofcc’s

penalty(cache)(2)D

onein

software

(O.S

.coulddo

this)form

ain-mem

.miss—

theO.S

.setsup

theappropriatedisk

interfacefora

DM

Aand

leavestheC

PU

;theC

PU

executesanotherprocess,while

transferfromdisk

tom

ain-mem

.takesplacesim

ultaneously

c�

ShantanuD

utt,UIC

13

13

Page 14: EECS Uni - University of Illinois at Chicagodutt/courses/ece366/lect16-mem-hier.pdfEECS 366: Computer Ar chitecur e Instructor: Shantanu Dutt Department of EECS Uni v ersity of Illinois

Som

eBasicIssuesin

Mem

oryH

ierarchies

Again

we

consider2adjacentlevels

ofthehierarchy:

1.Block

Placem

ent:Wherecan

ablock

beplacedin

theupperlevel?

2.Block

Identification:How

isa

blockfound

inthe

upperlevel?3.B

lockR

eplacement:W

hichblock

toreplaceduring

am

iss?4.W

riteS

trategy:W

hathappensona

write

tothe

upperlevel—how

isthis

percolatedtothe

lowerlevel

c�

ShantanuD

utt,UIC

14

14

Page 15: EECS Uni - University of Illinois at Chicagodutt/courses/ece366/lect16-mem-hier.pdfEECS 366: Computer Ar chitecur e Instructor: Shantanu Dutt Department of EECS Uni v ersity of Illinois

Som

eBasicIssuesin

Mem

oryH

ierarchies(contd.)

(1)Block

Placem

ent:

Fully

Associative

(FA):

Can

placeanywhere;have

tolook

everywhere

SetA

ssociative(S

A):T

heupper-levelis

dividedintoJ

sets9( ( J ��,

eachcontaining0blocks(0

-way

setassociative).Ablock

with

block#� ,

isplacedonly

inset� �KL

J ;itcan

beplacedanyw

hereinthis

set

DirectM

apped(DM

):The

upper-levelisdivided

into� blocks9( ( � �� ,

anda

blockw

ithblock

#� ,isplacedonly

inblock� �KL

� ;�

isgenerally

apow

erof2,say,8 �.

Will

needtolook

atonly1

blockposition

forthe

requiredblock.

01

23

45

67

01

23

45

67

01

23

45

67

Set

0S

et 1

Set

2S

et 3

01

23

45

67

89

01

23

45

67

89

01

23

45

67

89

Block 14 can go anyw

hereF

ully associative (FA

):D

irect mapped (D

M):

Block 14 can only go into

block 14 mod 8 =

6

2-way Set A

ssociative (SA):

Block 14 can go anyw

here inset 14 m

od 4 = 2

MM MM MM MM MM MM MM MM MM MM MMNN NN NN NN NN NN NN NN NN NN NNOO OO OO OO OO OO OO OO OO OO OOPP PP PP PP PP PP PP PP PP PP PPQQ QQ QQ QQ QQ QQ QQ QQ QQ QQ QQRR RR RR RR RR RR RR RR RR RR RRSS SS SS SS SS SS SS SS SS SS SSTT TT TT TT TT TT TT TT TT TT TTUU UU UU UU UU UU UU UU UU UU UUVV VV VV VV VV VV VV VV VV VV VVWW WW WW WW WW WW WW WW WW WW WWXX XX XX XX XX XX XX XX XX XX XXYY YY YY YY YY YY YY YY YY YY YYZZ ZZ ZZ ZZ ZZ ZZ ZZ ZZ ZZ ZZ ZZ[[ [[ [[ [[ [[ [[ [[ [[ [[ [[ [[\\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\

]] ]] ]] ]] ]] ]] ]] ]] ]] ]]^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^__ __ __ __ __ __ __ __ __ __ __`` `` `` `` `` `` `` `` `` `` ``

aa aa aa aa aa aa aa aa aa aa aabb bb bb bb bb bb bb bb bb bb bbcc cc cc cc cc cc cc cc cc cc ccdd dd dd dd dd dd dd dd dd dd dd

eeeeeeeee

fffffffff

Bl. #

Bl. #

Bl. #

Bl. #

11

11

11

11

11

22

22

22

22

22

33

01

Block 14

c�

ShantanuD

utt,UIC

15

15

Page 16: EECS Uni - University of Illinois at Chicagodutt/courses/ece366/lect16-mem-hier.pdfEECS 366: Computer Ar chitecur e Instructor: Shantanu Dutt Department of EECS Uni v ersity of Illinois

Som

eBasicIssuesin

Mem

oryH

ierarchies

(1)Block

Placem

ent(contd.):

FAand

DM

arespecialcasesofset-associative.In

FA,

thereisonly

onesetcontainingall� blocks.In

DM

,thereare� sets,eachcontainingexactly1

block

FAhasthe

mostflexibility

inplacing

ablock,w

hileD

Mhasthe

least

c�

ShantanuD

utt,UIC

16

16

Page 17: EECS Uni - University of Illinois at Chicagodutt/courses/ece366/lect16-mem-hier.pdfEECS 366: Computer Ar chitecur e Instructor: Shantanu Dutt Department of EECS Uni v ersity of Illinois

Som

eBasicIssuesin

Mem

oryH

ierarchies(contd.)

(2)Block

Identification:�

Associative

orcontent-addressiblemem

ory(CA

M):

storestheblock

#or

tagsof

residentblocksfor

eachset.

The

index,w

hichis

the HKg J�h

rightmostbits

oftheblock

#,determinesw

hichsetofthe

CA

Mto

searchfor

therestofthe

block#

(thetag).T

hisis

generallyusedinthe

cache–m

ain-mem

.hierarchy.01

23

45

67

01

23

45

67

01

23

45

67

Set

0S

et 1

Set

2S

et 3

1414

14

Search only in tag

position 14 mod 8 =

6S

earch everywhere

within set 14 m

od 4 = 2

Block offset/

Word #

Tag

Index

Block #

(b) Different portions of an address: T

he index (address mod s) is used to

select the set (in DM

and SA

), and the tag is used to check all blocks inthe "indexed" set, and the w

ord # is used to select the word in the block

ii ii ii ii ii ii ii ii ii iijj jj jj jj jj jj jj jj jj jj

kk kk kk kk kk kk kk kk kk kkll ll ll ll ll ll ll ll ll ll

mm mm mm mm mm mm mm mm mm mmnn nn nn nn nn nn nn nn nn nno o o o o op p p p p pq q q q q qr r r r r rs s s s s st t t t t t

u u u u u uv v v v v v

Bl. #

Bl. #

Bl. #

Block 14

Direct m

apped (DM

):2-w

ay Set Associative (SA

):F

ully associative (FA

):

DataTag

Data

Data

Tag

Tag

Search everyw

here

(a) Block identification in different cache types. S

earchperform

ed in parallel in FA

and SA

caches for speed.

Search

Search

Search

c�

ShantanuD

utt,UIC

17

17

Page 18: EECS Uni - University of Illinois at Chicagodutt/courses/ece366/lect16-mem-hier.pdfEECS 366: Computer Ar chitecur e Instructor: Shantanu Dutt Department of EECS Uni v ersity of Illinois

Som

eBasicIssuesin

Mem

oryH

ierarchies(contd.)

(2)Block

Identification:CA

Ms

–S

tructureofaC

AM

:

Com

paratorE

quality

Com

paratorE

quality

Com

paratorE

quality

Data Store

Tag Store

Tag

Desired W

ord

Word #

16 words/blockB

lock

1/0

1/0

1/0V

alid bit

m

2r

Miss

Structure of a CA

M :

Note: Search logic replaces a regular decoder.

Fully-associative cache

Note: V

alid bit is present in tag storeA

ND

’s with the O

/P of the

corresponding equality comparator.

a1

a0

a2

a3

a4

a5

a7

a6

x0

x1

x2

x3

x4

x5

x6

x7

1 : Equal

0 : Not equal

Equality C

omparator

(Inputs x & a)

c�

ShantanuD

utt,UIC

18

18

Page 19: EECS Uni - University of Illinois at Chicagodutt/courses/ece366/lect16-mem-hier.pdfEECS 366: Computer Ar chitecur e Instructor: Shantanu Dutt Department of EECS Uni v ersity of Illinois

Som

eBasicIssuesin

Mem

oryH

ierarchies(contd.)

CA

Ms:

HardwareC

omplexity:

Ofparallelsearchlogic

=w� 8x8 3� fora

FAcache,

where8 3

isthe

sizeof

thecachein

blocks,andx

isthe

#of

bitsin

theblock

#.T

hiscan

beprohibitive

forlargex

andy

ForSA

cache,we

haveone

suchCA

Mofsize8 3 !�@

� x �h�

foreachof

theJ �8 �

sets.So

totalCA

Msize

is 8 3@� x �h� .H

owever,thereis

onlyone

parallelsearchlogicof

size8 3 !�@� x �h�

which

isusedto

searchonly

theindexed

set

19

Page 20: EECS Uni - University of Illinois at Chicagodutt/courses/ece366/lect16-mem-hier.pdfEECS 366: Computer Ar chitecur e Instructor: Shantanu Dutt Department of EECS Uni v ersity of Illinois

Tag

Data B

lock

Search

LogicIndex

Index

Data S

toreT

agS

tore

l−to−

2**l=

5−

to−32

Decoder

2**(r−l)−

to−1

= 32−

to−1

Mux

Set

#

012**l−1

= 31

2**(r−l)

=32

m−

l=

15

2**(r−l)

=32

16 blocks =

512 bits

512bits

15bits

15

55

l−to−

2**l=1−

to−32

Decoder

Set #

0131

4 30

Word #

23 9 8

m=

20l=

5r=

10

Block # (20)

Tag (15)

Index (5)

Cache size=

2**r = 1024 blocks

# of sets = 32, set size =

32 blocks

lm

Thereis

onlyone

equalitycomparatorin

aD

Mcache;thuscom

plexityis

w� 8� x �y��

Tim

ecom

plexityofsearch:w� HKg x

forFA

,w� HKg� x �h��

forS

A,and

w� HKg� x �y��

c�

ShantanuD

utt,UIC

20

20

Page 21: EECS Uni - University of Illinois at Chicagodutt/courses/ece366/lect16-mem-hier.pdfEECS 366: Computer Ar chitecur e Instructor: Shantanu Dutt Department of EECS Uni v ersity of Illinois

Som

eBasicIssuesin

Mem

oryH

ierarchies

(2)Block

Identification(contd.):

Lookuptable:S

toresthetagsalsoby

sets,asinthe

CA

M.H

owever,this

isregularkind

ofmem

ory,andis

ofthesam

etechnologyastheupperlevel.

Thus2

mem

oryaccessesarereqd.to

theupperlevelto

getaw

ordfrom

there.This

isgenerallyusedin

them

ain-mem

.–sec.storagehierarchy.

Tablesizez

totalsizein

blocksinlow

erlevel.T

hisis

differentthanthe

upperlevelinw

hicha

CA

Mis

usedasthe

“lookuptable”and

itssize

is

z

thesize

inblocksin

theupperlevel.

Block #

of address

Block #

Present

bitD

irtybit

Location incurrent level

012141

02

1516

Lookup T

able:

c�

ShantanuD

utt,UIC

21

21

Page 22: EECS Uni - University of Illinois at Chicagodutt/courses/ece366/lect16-mem-hier.pdfEECS 366: Computer Ar chitecur e Instructor: Shantanu Dutt Department of EECS Uni v ersity of Illinois

Som

eBasicIssuesin

Mem

oryH

ierarchies(contd.)

(3)Block

Replacem

entPolicy:W

hichblock

inthe

settoreplace?N

ochoice

inD

Mcache.S

othe

questionappliestoFA

andS

Acache.T

hefollo

wing

policiescanbe

usedforeachset;allpoliciesm

akeuse

oftem

porallocalityto

predictwhich

blockw

illbe

accessedfurthestinthe

future.

Least

Frequently

Used

(LF

U):

Note

the#

oftim

eseachblock

hasbeenusedoversom

ewindow

oftim

eand

replacetheone

usedtheleast#

oftim

es.Mostexpensive

toim

plement

Least

Recently

Used

(LR

U):

Keep

theblocks

ineachsetorderedby

thetim

eoftheirm

ostrecentused.Whenevera

newblock

isaccessedin

theset,m

oveit

tothe

topof

thelist.

Replacethe

blockatthe

bottom.2nd

mostexpensive,butbestperform

ance

shifted left 1 block

Move to end on access

Data

LRU

MRU

shifted left 1 block

Move to end on access

Tag

Implem

entation of LRU

scheme: LR

U is perform

edin entire cache for F

A or in the accessed set for S

A

c�

ShantanuD

utt,UIC

22

22

Page 23: EECS Uni - University of Illinois at Chicagodutt/courses/ece366/lect16-mem-hier.pdfEECS 366: Computer Ar chitecur e Instructor: Shantanu Dutt Department of EECS Uni v ersity of Illinois

Som

eBasicIssuesin

Mem

oryH

ierarchies(contd.)

(3)Block

Replacem

entPolicy(contd.):

Not

Recently

Used

(NR

U):

Justpointtothe

blockused

mostrecently.

Replaceany

ofthe

otherblocks.3rd

mostexpensive

inhardware

andtim

e,andw

orstperformance

Random

:R

andomlychooseany

blockto

replace.Leastexpensive(espe-

ciallyin

time;have

todo

thisonly

when

thereis

am

iss)toim

plement,

and3rd

bestperformance(afterLR

U)

c�

ShantanuD

utt,UIC

23

23

Page 24: EECS Uni - University of Illinois at Chicagodutt/courses/ece366/lect16-mem-hier.pdfEECS 366: Computer Ar chitecur e Instructor: Shantanu Dutt Department of EECS Uni v ersity of Illinois

Som

eBasicIssuesin

Mem

oryH

ierarchies(contd.)

(4)Write

Strategy:W

hathappensona

write?

On

aw

ritehit:

1.Write

Back:

Write

tolow

erlevelw

henblock

isreplacedand

ifits

“dirty”bit

isset.T

hisbit

issetw

heneverwe

write

toa

blockin

theupperlevel.

This

isgenerallyusedw

henaccesstim

eto

lowerlevelis

high.2.W

riteT

hrough:W

riteto

bothlevels

simultaneouslythuskeepingthem

alwaysconsistent.

On

aw

ritem

iss:1.W

riteA

llocate:Load

theblock

written

toto

theupperlevel.

Again,

thisis

generallydonewhen

accesstime

tolow

erlevelishigh.

2.No

Write

Allocate:

Block

notloadedtothe

upperlevel—the

rationaleis

thatreadand

write

donothave

thesam

esphereofspatiallocality,

andas

explainedlater,

theC

PU

generallydoesnothaveto

wait

forw

rites(i.e.,S

TOR

Es)

The

combinationsgenerallyusedon

write

hit/miss

are1/1

and2/2.

The

latterisusedm

ainlyfor

thecache–

main-m

em.hierarchy

andthe

1/1com

binationforthe

main-m

em.–

sec.storagehierarchy(becauseofthe

largeraccesstime)

c�

ShantanuD

utt,UIC

24

24

Page 25: EECS Uni - University of Illinois at Chicagodutt/courses/ece366/lect16-mem-hier.pdfEECS 366: Computer Ar chitecur e Instructor: Shantanu Dutt Department of EECS Uni v ersity of Illinois

More

AboutC

ahces�

Made

fromS

RA

Ms

Sourceofcachem

isses:(1)

Com

pulsory:1sttime

accesstoa

blockw

illresultin

am

iss—“cold

startmiss”

(2)Capacity:C

achescannotcontainallblocksneededduringa

program’s

execution(3)

Conflictor

Collission:O

ccurswhen

(a)toom

anyreferencedblocks

map

tothe

sameset,and/or(b)the

setsizeis

verysm

all(fore.g.,in

DM

caches)

c�

ShantanuD

utt,UIC

25

25

Page 26: EECS Uni - University of Illinois at Chicagodutt/courses/ece366/lect16-mem-hier.pdfEECS 366: Computer Ar chitecur e Instructor: Shantanu Dutt Department of EECS Uni v ersity of Illinois

Sourceofcachem

isses(contd.)

23

56

NY

Y

N

Block # accessed:

Access class:

Block # replaced:

(using LRU

in sets)

Global LR

U block?

42

73

Set 0Set 1

2-way SA

cache(size=

4 blocks)-, -, -, -, -, .........., -, ..........., 2, -, 6, 3, ......., 5

Cm

, h, Cm

, h, Cm

, ..h’s..., Cm

, h, h, Cm

, h, Cn, C

m,.h’s., C

p

2, 2, 3, 3, 5, .........., 6, 2, 6 , 4, 4, 2, 7, ......., 3

c�

ShantanuD

utt,UIC

26

26

Page 27: EECS Uni - University of Illinois at Chicagodutt/courses/ece366/lect16-mem-hier.pdfEECS 366: Computer Ar chitecur e Instructor: Shantanu Dutt Department of EECS Uni v ersity of Illinois

Sourceofcachem

isses(contd.)

c�

ShantanuD

utt,UIC

27

27

Page 28: EECS Uni - University of Illinois at Chicagodutt/courses/ece366/lect16-mem-hier.pdfEECS 366: Computer Ar chitecur e Instructor: Shantanu Dutt Department of EECS Uni v ersity of Illinois

More

AboutC

ahces(contd.)�

Effectofblock

size

�*+�� � �� ��� �� � @� ���F G� HI�

c�

ShantanuD

utt,UIC

28

28

Page 29: EECS Uni - University of Illinois at Chicagodutt/courses/ece366/lect16-mem-hier.pdfEECS 366: Computer Ar chitecur e Instructor: Shantanu Dutt Department of EECS Uni v ersity of Illinois

More

AboutC

ahces(contd.)�

Separatedataand

instructioncaches.Can

havedifferentblock

sizes,ca-pacitiesand

associativitiesto

optimize

performance

c�

ShantanuD

utt,UIC

29

29