Sem in a Rio Snoopy Protocol

Post on 09-Mar-2015

158 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

Transcript

Simple, Bus-Based Multiprocessor

Marcus Vinicius Duarte

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

M- ModifiedS – SharedI - Invalid

Arquitetura do Esquema

Exercício 4.1

4.1A. [10] <4.2> P0: read 120

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

4.1A. [10] <4.2> P0: read 120

P0

State

Tag Data1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

P0.B0: (S, 120, 00, 20), Leitura retorna o valor 20

4.1B. [10] <4.2> P0: write 120 <-- 80

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

4.1B. [10] <4.2> P0: write 120 <-- 80

P0

State

Tag Data1

Data0

B0 M 120 00 80

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 I 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

P0.B0: (M, 120, 00, 80), P15.B0: (I, 120, 00, 20)

4.1C. [10] <4.2> P15: write 120 <-- 80

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

4.1C. [10] <4.2> P15: write 120 <-- 80

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 M 120 00 80

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

P15.B0: (M, 120, 00, 80)

4.1D. [10] <4.2> P1: read 110

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

4.1D. [10] <4.2> P1: read 110

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 S 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 S 110 00 30

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 30

118 00 18

120 00 20

128 00 28

130 00 30

...

P0.B2: (S, 110, 00, 30), P1.B2: (S, 110, 00, 30), M[110]: (00, 30), A leitura retorna 30

3 (write back)

2

1

4.1E. [10] <4.2> P0: write 108 <-- 48

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

4.1E. [10] <4.2> P0: write 108 <-- 48

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 108 00 48

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 I 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

P0.B1: (M, 108, 00, 48), P15.B1: (I, 108, 00, 08)

4.1F. [10] <4.2> P0: write 130 <-- 78

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

4.1F. [10] <4.2> P0: write 130 <-- 78

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 130 00 78

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 30

118 00 18

120 00 20

128 00 28

130 00 30

...

P0.B2: (M, 130, 00, 78), M[110]: (00, 30)

Write back

4.1G. [10] <4.2> P15: write 130 <-- 78

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

4.1G. [10] <4.2> P15: write 130 <-- 78

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 M 130 00 78

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

P0.B2: (M, 130, 00, 78)

Exercício 4.2

CONSIDERAÇÕES1- CPU read and write hits generate no stall cycles.

2- CPU read and write misses generate Nmemory and Ncache stall cycles if satisfiedby memory and cache, respectively.

3- CPU write hits that generate an invalidate incur Ninvalidate stall cycles.

4- a writeback of a block, either due to a conflict or another processor’s requestto an exclusive block, incurs an additional Nwriteback stall cycles.

Parâmetro Implementação 1 Implementação 2

Nmemory 100 100

Ncache 70 130

Ninvalidate 15 15

Nwrite back 10 10

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

A - [20] <4.3> I- P0: read 120II - P0: read 128III - P0: read 130

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

A - [20] <4.3> I- P0: read 120II - P0: read 128III - P0: read 130

I - Read miss, resolvido pela memóriaII - Read miss, resolvido pela cache de P1, writeback do end. 110 (shared).III - Read miss, resolvido pela memória, writeback do end. 110

Implementação 1: 100 + 70 + 10 + 100 + 10 = 290 stall cyclesImplementação 2: 100 + 130 + 10 + 100 + 10 = 350 stall cycle

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

B- [20] <4.3> P0: read 100P0: write 108 <-- 48P0: write 130 <-- 78

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

B- [20] <4.3> I - P0: read 100II - P0: write 108 <-- 48IIII - P0: write 130 <-- 78

I - Read miss, resolvido pela memória II - Write hit, encaminha Invalidate III - Write miss, resolvido pela memória, write back 110

Implementation 1: 100 + 15 + 10 + 100 = 225 stall cycles Implementation 2: 100 + 15 + 10 + 100 = 225 stall cycles

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

C- [20] <4.3> P1: read 120P1: read 128P1: read 130

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

C- [20] <4.3> I - P1: read 120II - P1: read 128III -P1: read 130

I - Read miss, resolvido pela memória II -Read hit III - Read miss, resolvido pela memória

Implementação 1: 100 + 0 + 100 = 200 stall cyclesImplementação 2: 100 + 0 + 100 = 200 stall cycles

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

D- [20] <4.3> P1: read 100P1: write 108 <-- 48P1: write 130 <-- 78

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

D- [20] <4.3> I - P1: read 100II - P1: write 108 <-- 48III - P1: write 130 <-- 78

I - Read miss, resolvido pela memória II - Write miss, resolvido pela memória, write back 128 III - Write miss, Resolvido pela memória

Implementação 1: 100 + 100 + 10 + 100 = 310 stall cyclesImplementação 2: 100 + 100 + 10 + 100 = 310 stall cycles

Exercício 4.3

A common protocol optimization is to introduce an Owned state (usually denoted O). The “Owned” state behaves like the Shared state, in that nodes may only read Owned blocks. But it behaves like the Modified state, in that nodes must supply data on other nodes’ read and write misses to Owned blocks. A read miss to a block in either the Modified or Owned states supplies data to the requesting node and transitions to the Owned state. A write miss to a block in either state Modified or Owned supplies data to the requesting node and transitions to state Invalid. This optimized MOSI protocol only updates memory when a node replaces a block in state Modified or Owned.

INVALID SHARED

MODIFIED OWNED

CPU READColoca “read miss” no bus

CPU WRITE

Põe “Invalidate”

no bus

CPU

WRI

TEPõ

e W

rite

Mis

s no

Bus

CPU READ HIT

CPU WRITE HITCPU READ HIT

CPU READ HIT

CPU

CPU WRITEPõe “Invalidate” no bus

INVALID SHARED

MODIFIED OWNED

“Invalidate” para esse bloco

Writ

e m

iss

par

a es

se b

loco

Writ

e ba

ck o

blo

co;

Abor

ta o

ace

sso

à m

emor

ia

READ MISSWriteback o bloco;

Aborta a acesso à memória

BUSWrite miss ou

“Invalidate” para esse bloco

Write miss para esse bloco

Writeback o bloco;

Aborta o acesso à memória

INVALID SHARED

MODIFIED OWNED

CPU READColoca “read miss” no bus

CPU WRITE

Coloca “Invalidate”

no bus

CPU

WRI

TEPõ

e W

rite

Mis

s no

Bus

CPU READ HIT

CPU WRITE HITCPU READ HIT

CPU READ HIT

COMPLETO

CPU WRITEPõe “Invalidate” no bus

“Invalidate” para esse bloco

Writ

e m

iss

par

a es

se b

loco

Writ

e ba

ck o

blo

co;

Abor

ta o

ace

sso

à m

emor

ia

READ MISSWriteback o bloco;

Aborta a acesso à memória

Write miss ou “Invalidate” para esse bloco

Write miss para esse bloco

Writeback para o bloco;

Aborta o acesso à memória

Exercício 4.4

For the following code sequences and the timing parameters for the two implementations in Figure 4.38 (Exercise 4.2), compute the total stall cycles for the base MSI protocol and the optimized MOSI protocol in Exercise 4.3. Assume state transitions that do not require bus transactions incur no additional stall cycles.

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

A - [20] <4.2> P1: read 110P15: read 110P0: read 110

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

A - [20] <4.2> I- P1: read 110II - P15: read 110III - P0: read 110

I - Read miss, Pega da cache de P0II - Read miss, MSI resolvido pela memória, MOSI Resolvido pela cache de P0 III - Read hit

MSI: 70 + 10 + 100 + 0 = 180 stall cyclesMOSI: 70 + 10 + 70 + 10 + 0 = 160 stall cycles

MSI

MOSI

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

B - [20] <4.2> P1: read 120P15: read 120P0: read 120

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

B - [20] <4.2> I- P1: read 120II - P15: read 120III - P0: read 120

I - Read miss, resolvido pela memória II - Read hit III - Read miss, resolvido pela memória

MSI e MOSI: 100 + 0 + 100 = 200 stall cycles

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

C - [20] <4.2> P0: write 120 <-- 80P15: read 120P0: read 120

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

C - [20] <4.2> I - P0: write 120 <-- 80II - P15: read 120III - P0: read 120

I - Write miss, invalida P15 II - Read miss, pega da cache de P0 III - Read hit MSI e MOSI: 100 + 70 + 10 + 0 = 180 stall cycles

Read miss

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

D - [20] <4.2> P0: write 108 <-- 88P15: read 108P0: write 108 <-- 98

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

D - [20] <4.2> I - P0: write 108 <-- 88II - P15: read 108III - P0: write 108 <-- 98

I –Encaminha “invalidate”, Invalida 108 de P15 II - Read miss, usa a cache de P0 III – Encaminha “invalidate”, Invalida 108 em P15

MSI e MOSI: 15 + 70 + 10 + 15 = 110 stall cycle

Invalidate

Exercício 4.5

Some applications read a large data set first, then modify most or all of it. The base MSI coherence protocol will first fetch all of the cache blocks in the Shared state, and then be forced to perform an invalidate operation to upgrade them to the Modified state. The additional delay has a significant impact on some workloads. An additional protocol optimization eliminates the need to upgrade blocks that are read and later written by a single processor. This optimization adds the Exclusive (E) state to the protocol, indicating that no other node has a copy of the block, but it has not yet been modified. A cache block enters the Exclusive state when a read miss is satisfied by memory and no other node has a valid copy. CPU reads and writes to that block proceed with no further bus traffic, but CPU writes cause the coherence state to transition to Modified. Exclusive differs from Modified because the node may silently replace Exclusive blocks (while Modified blocks must be written back to memory). Also, a read miss to an Exclusive block results in a transition to Shared, but does not require the node to respond with data (since memory has an up-to-date copy).Draw new protocol diagrams for a MESI protocol that adds the Exclusive state and transitions to the base MSI protocol’s Modified, Shared, and Invalidate states.

INVALID SHARED

MODIFIED EXCLUSIVE

CPU READOutro bloco Shared coloca

“read miss” no bus

CPU WRITE

Põe “Invalidate”

no bus

CPU

WRI

TEPõ

e W

rite

Mis

s no

Bus

CPU READ HIT

CPU WRITE HITCPU READ HIT

CPU READ HIT

CPU

CPU WRITE HIT

CPU READ

Ninguém em shared,

Põe read miss no bus

INVALID SHARED

MODIFIED EXCLUSIVE

Writ

e m

iss

par

a es

se b

loco

Writ

e ba

ck o

blo

co;

Abor

ta o

ace

sso

à m

emor

ia

BUS

Write miss ou “Invalidate” para esse bloco

Write miss ou invalidate para

esse bloco

Read

Mis

s

READ MISS

Write back o bloco

Aborta acesso

à memória

INVALID SHARED

MODIFIED EXCLUSIVE

CPU READOutro bloco Shared coloca

“read miss” no bus

CPU WRITE

Põe “Invalidate”

no bus

CPU

WRI

TEPõ

e W

rite

Mis

s no

Bus

CPU READ HIT

CPU WRITE HITCPU READ HIT

CPU READ HIT

COMPLETO

CPU WRITE HIT

CPU READ

Ninguém em shared,

Põe read miss no bus

Writ

e m

iss

par

a es

se b

loco

Writ

e ba

ck o

blo

co;

Abor

ta o

ace

sso

à m

emor

iaWrite miss ou

“Invalidate” para esse bloco

Write miss ou invalidate para

esse bloco

Read

Mis

s

READ MISS

Write back o bloco

Aborta acesso

à memória

Exercício 4.6

Assume the cache contents of Figure 4.37 and the timing of Implementation 1 in Figure 4.38. What are the total stall cycles for the following code sequences with both the base protocol and the new MESI protocol in Exercise 4.5? Assume state transitions that do not require bus transactions incur no additional stall cycles.

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

A - <4.2> P0: read 100P0: write 100 <-- 40

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

A - <4.2> I - P0: read 100II - P0: write 100 <-- 40

I - Read miss, resolvido pela memória, ninguém em shared MSI: Shared, MESI: Exclusive II - MSI: encaminha “invalidate”, MESI: transição de Exclusive to Modif.

MSI: 100 + 15 = 115 stall cyclesMESI: 100 + 0 = 100 stall cycles

Invalidate (MSI)

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

B - <4.2> P0: read 120P0: write 120 <-- 60

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

B - <4.2> I - P0: read 120II - P0: write 120 <-- 60

I - Read miss, resolvido pela memória, “compatihados” de ambos ficam Shared II – Ambas encaminham “invalidate “

MSI e MESI: 100 + 15 = 115 stall cycles

Invalidate (MSI)

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

C - <4.2> P0: read 100P0: read 120

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

C - <4.2> I - P0: read 100II - P0: read 120

I - Read miss, resolvido pela memória, Ninguém em “shared” MSI: S, MESI: E II - Read miss, resolvido pela memoria, silently replace 120 from S or E

Both: 100 + 100 = 200 stall cycles, silent replacement from E

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

D - <4.2> P0: read 100P1: write 100 <-- 60

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

D - <4.2> I - P0: read 100II - P1: write 100 <-- 60

I - Read miss, resolvido em memória, Ninguém em Shared MSI: S, MESI: E II - Write miss, resolvido em memória

MSI e MESI: 100 + 100 = 200 stall cycles

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

E - <4.2> P0: read 100P0: write 100 <-- 60P1: write 100 <-- 40

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

E - <4.2> I - P0: read 100II - P0: write 100 <-- 60III - P1: write 100 <-- 40

I - Read miss, resolvido em memória, Ninguém em shared MSI: S, MESI: E II - MSI: envia “Invalidate”, MESI: transiciona de Exclus. to Modified III - Write miss, vindo da cache de P0, writeback dados para a memória

MSI: 100 + 15 + 70 + 10 = 195 stall cyclesMESI: 100 + 0 + 70 + 10 = 180 stall cycles

Invalidate

top related