Top Banner
Simple, Bus-Based Multiprocessor Marcus Vinicius Duarte
55

Sem in a Rio Snoopy Protocol

Mar 09, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Sem in a Rio Snoopy Protocol

Simple, Bus-Based Multiprocessor

Marcus Vinicius Duarte

Page 2: Sem in a Rio Snoopy Protocol

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

M- ModifiedS – SharedI - Invalid

Arquitetura do Esquema

Page 3: Sem in a Rio Snoopy Protocol

Exercício 4.1

Page 4: Sem in a Rio Snoopy Protocol

4.1A. [10] <4.2> P0: read 120

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

Page 5: Sem in a Rio Snoopy Protocol

4.1A. [10] <4.2> P0: read 120

P0

State

Tag Data1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

P0.B0: (S, 120, 00, 20), Leitura retorna o valor 20

Page 6: Sem in a Rio Snoopy Protocol

4.1B. [10] <4.2> P0: write 120 <-- 80

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

Page 7: Sem in a Rio Snoopy Protocol

4.1B. [10] <4.2> P0: write 120 <-- 80

P0

State

Tag Data1

Data0

B0 M 120 00 80

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 I 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

P0.B0: (M, 120, 00, 80), P15.B0: (I, 120, 00, 20)

Page 8: Sem in a Rio Snoopy Protocol

4.1C. [10] <4.2> P15: write 120 <-- 80

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

Page 9: Sem in a Rio Snoopy Protocol

4.1C. [10] <4.2> P15: write 120 <-- 80

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 M 120 00 80

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

P15.B0: (M, 120, 00, 80)

Page 10: Sem in a Rio Snoopy Protocol

4.1D. [10] <4.2> P1: read 110

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

Page 11: Sem in a Rio Snoopy Protocol

4.1D. [10] <4.2> P1: read 110

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 S 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 S 110 00 30

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 30

118 00 18

120 00 20

128 00 28

130 00 30

...

P0.B2: (S, 110, 00, 30), P1.B2: (S, 110, 00, 30), M[110]: (00, 30), A leitura retorna 30

3 (write back)

2

1

Page 12: Sem in a Rio Snoopy Protocol

4.1E. [10] <4.2> P0: write 108 <-- 48

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

Page 13: Sem in a Rio Snoopy Protocol

4.1E. [10] <4.2> P0: write 108 <-- 48

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 108 00 48

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 I 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

P0.B1: (M, 108, 00, 48), P15.B1: (I, 108, 00, 08)

Page 14: Sem in a Rio Snoopy Protocol

4.1F. [10] <4.2> P0: write 130 <-- 78

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

Page 15: Sem in a Rio Snoopy Protocol

4.1F. [10] <4.2> P0: write 130 <-- 78

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 130 00 78

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 30

118 00 18

120 00 20

128 00 28

130 00 30

...

P0.B2: (M, 130, 00, 78), M[110]: (00, 30)

Write back

Page 16: Sem in a Rio Snoopy Protocol

4.1G. [10] <4.2> P15: write 130 <-- 78

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

Page 17: Sem in a Rio Snoopy Protocol

4.1G. [10] <4.2> P15: write 130 <-- 78

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 M 130 00 78

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

P0.B2: (M, 130, 00, 78)

Page 18: Sem in a Rio Snoopy Protocol

Exercício 4.2

Page 19: Sem in a Rio Snoopy Protocol

CONSIDERAÇÕES1- CPU read and write hits generate no stall cycles.

2- CPU read and write misses generate Nmemory and Ncache stall cycles if satisfiedby memory and cache, respectively.

3- CPU write hits that generate an invalidate incur Ninvalidate stall cycles.

4- a writeback of a block, either due to a conflict or another processor’s requestto an exclusive block, incurs an additional Nwriteback stall cycles.

Parâmetro Implementação 1 Implementação 2

Nmemory 100 100

Ncache 70 130

Ninvalidate 15 15

Nwrite back 10 10

Page 20: Sem in a Rio Snoopy Protocol

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

A - [20] <4.3> I- P0: read 120II - P0: read 128III - P0: read 130

Page 21: Sem in a Rio Snoopy Protocol

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

A - [20] <4.3> I- P0: read 120II - P0: read 128III - P0: read 130

I - Read miss, resolvido pela memóriaII - Read miss, resolvido pela cache de P1, writeback do end. 110 (shared).III - Read miss, resolvido pela memória, writeback do end. 110

Implementação 1: 100 + 70 + 10 + 100 + 10 = 290 stall cyclesImplementação 2: 100 + 130 + 10 + 100 + 10 = 350 stall cycle

Page 22: Sem in a Rio Snoopy Protocol

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

B- [20] <4.3> P0: read 100P0: write 108 <-- 48P0: write 130 <-- 78

Page 23: Sem in a Rio Snoopy Protocol

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

B- [20] <4.3> I - P0: read 100II - P0: write 108 <-- 48IIII - P0: write 130 <-- 78

I - Read miss, resolvido pela memória II - Write hit, encaminha Invalidate III - Write miss, resolvido pela memória, write back 110

Implementation 1: 100 + 15 + 10 + 100 = 225 stall cycles Implementation 2: 100 + 15 + 10 + 100 = 225 stall cycles

Page 24: Sem in a Rio Snoopy Protocol

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

C- [20] <4.3> P1: read 120P1: read 128P1: read 130

Page 25: Sem in a Rio Snoopy Protocol

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

C- [20] <4.3> I - P1: read 120II - P1: read 128III -P1: read 130

I - Read miss, resolvido pela memória II -Read hit III - Read miss, resolvido pela memória

Implementação 1: 100 + 0 + 100 = 200 stall cyclesImplementação 2: 100 + 0 + 100 = 200 stall cycles

Page 26: Sem in a Rio Snoopy Protocol

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

D- [20] <4.3> P1: read 100P1: write 108 <-- 48P1: write 130 <-- 78

Page 27: Sem in a Rio Snoopy Protocol

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

D- [20] <4.3> I - P1: read 100II - P1: write 108 <-- 48III - P1: write 130 <-- 78

I - Read miss, resolvido pela memória II - Write miss, resolvido pela memória, write back 128 III - Write miss, Resolvido pela memória

Implementação 1: 100 + 100 + 10 + 100 = 310 stall cyclesImplementação 2: 100 + 100 + 10 + 100 = 310 stall cycles

Page 28: Sem in a Rio Snoopy Protocol

Exercício 4.3

A common protocol optimization is to introduce an Owned state (usually denoted O). The “Owned” state behaves like the Shared state, in that nodes may only read Owned blocks. But it behaves like the Modified state, in that nodes must supply data on other nodes’ read and write misses to Owned blocks. A read miss to a block in either the Modified or Owned states supplies data to the requesting node and transitions to the Owned state. A write miss to a block in either state Modified or Owned supplies data to the requesting node and transitions to state Invalid. This optimized MOSI protocol only updates memory when a node replaces a block in state Modified or Owned.

Page 29: Sem in a Rio Snoopy Protocol

INVALID SHARED

MODIFIED OWNED

CPU READColoca “read miss” no bus

CPU WRITE

Põe “Invalidate”

no bus

CPU

WRI

TEPõ

e W

rite

Mis

s no

Bus

CPU READ HIT

CPU WRITE HITCPU READ HIT

CPU READ HIT

CPU

CPU WRITEPõe “Invalidate” no bus

Page 30: Sem in a Rio Snoopy Protocol

INVALID SHARED

MODIFIED OWNED

“Invalidate” para esse bloco

Writ

e m

iss

par

a es

se b

loco

Writ

e ba

ck o

blo

co;

Abor

ta o

ace

sso

à m

emor

ia

READ MISSWriteback o bloco;

Aborta a acesso à memória

BUSWrite miss ou

“Invalidate” para esse bloco

Write miss para esse bloco

Writeback o bloco;

Aborta o acesso à memória

Page 31: Sem in a Rio Snoopy Protocol

INVALID SHARED

MODIFIED OWNED

CPU READColoca “read miss” no bus

CPU WRITE

Coloca “Invalidate”

no bus

CPU

WRI

TEPõ

e W

rite

Mis

s no

Bus

CPU READ HIT

CPU WRITE HITCPU READ HIT

CPU READ HIT

COMPLETO

CPU WRITEPõe “Invalidate” no bus

“Invalidate” para esse bloco

Writ

e m

iss

par

a es

se b

loco

Writ

e ba

ck o

blo

co;

Abor

ta o

ace

sso

à m

emor

ia

READ MISSWriteback o bloco;

Aborta a acesso à memória

Write miss ou “Invalidate” para esse bloco

Write miss para esse bloco

Writeback para o bloco;

Aborta o acesso à memória

Page 32: Sem in a Rio Snoopy Protocol

Exercício 4.4

For the following code sequences and the timing parameters for the two implementations in Figure 4.38 (Exercise 4.2), compute the total stall cycles for the base MSI protocol and the optimized MOSI protocol in Exercise 4.3. Assume state transitions that do not require bus transactions incur no additional stall cycles.

Page 33: Sem in a Rio Snoopy Protocol

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

A - [20] <4.2> P1: read 110P15: read 110P0: read 110

Page 34: Sem in a Rio Snoopy Protocol

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

A - [20] <4.2> I- P1: read 110II - P15: read 110III - P0: read 110

I - Read miss, Pega da cache de P0II - Read miss, MSI resolvido pela memória, MOSI Resolvido pela cache de P0 III - Read hit

MSI: 70 + 10 + 100 + 0 = 180 stall cyclesMOSI: 70 + 10 + 70 + 10 + 0 = 160 stall cycles

MSI

MOSI

Page 35: Sem in a Rio Snoopy Protocol

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

B - [20] <4.2> P1: read 120P15: read 120P0: read 120

Page 36: Sem in a Rio Snoopy Protocol

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

B - [20] <4.2> I- P1: read 120II - P15: read 120III - P0: read 120

I - Read miss, resolvido pela memória II - Read hit III - Read miss, resolvido pela memória

MSI e MOSI: 100 + 0 + 100 = 200 stall cycles

Page 37: Sem in a Rio Snoopy Protocol

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

C - [20] <4.2> P0: write 120 <-- 80P15: read 120P0: read 120

Page 38: Sem in a Rio Snoopy Protocol

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

C - [20] <4.2> I - P0: write 120 <-- 80II - P15: read 120III - P0: read 120

I - Write miss, invalida P15 II - Read miss, pega da cache de P0 III - Read hit MSI e MOSI: 100 + 70 + 10 + 0 = 180 stall cycles

Read miss

Page 39: Sem in a Rio Snoopy Protocol

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

D - [20] <4.2> P0: write 108 <-- 88P15: read 108P0: write 108 <-- 98

Page 40: Sem in a Rio Snoopy Protocol

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

D - [20] <4.2> I - P0: write 108 <-- 88II - P15: read 108III - P0: write 108 <-- 98

I –Encaminha “invalidate”, Invalida 108 de P15 II - Read miss, usa a cache de P0 III – Encaminha “invalidate”, Invalida 108 em P15

MSI e MOSI: 15 + 70 + 10 + 15 = 110 stall cycle

Invalidate

Page 41: Sem in a Rio Snoopy Protocol

Exercício 4.5

Some applications read a large data set first, then modify most or all of it. The base MSI coherence protocol will first fetch all of the cache blocks in the Shared state, and then be forced to perform an invalidate operation to upgrade them to the Modified state. The additional delay has a significant impact on some workloads. An additional protocol optimization eliminates the need to upgrade blocks that are read and later written by a single processor. This optimization adds the Exclusive (E) state to the protocol, indicating that no other node has a copy of the block, but it has not yet been modified. A cache block enters the Exclusive state when a read miss is satisfied by memory and no other node has a valid copy. CPU reads and writes to that block proceed with no further bus traffic, but CPU writes cause the coherence state to transition to Modified. Exclusive differs from Modified because the node may silently replace Exclusive blocks (while Modified blocks must be written back to memory). Also, a read miss to an Exclusive block results in a transition to Shared, but does not require the node to respond with data (since memory has an up-to-date copy).Draw new protocol diagrams for a MESI protocol that adds the Exclusive state and transitions to the base MSI protocol’s Modified, Shared, and Invalidate states.

Page 42: Sem in a Rio Snoopy Protocol

INVALID SHARED

MODIFIED EXCLUSIVE

CPU READOutro bloco Shared coloca

“read miss” no bus

CPU WRITE

Põe “Invalidate”

no bus

CPU

WRI

TEPõ

e W

rite

Mis

s no

Bus

CPU READ HIT

CPU WRITE HITCPU READ HIT

CPU READ HIT

CPU

CPU WRITE HIT

CPU READ

Ninguém em shared,

Põe read miss no bus

Page 43: Sem in a Rio Snoopy Protocol

INVALID SHARED

MODIFIED EXCLUSIVE

Writ

e m

iss

par

a es

se b

loco

Writ

e ba

ck o

blo

co;

Abor

ta o

ace

sso

à m

emor

ia

BUS

Write miss ou “Invalidate” para esse bloco

Write miss ou invalidate para

esse bloco

Read

Mis

s

READ MISS

Write back o bloco

Aborta acesso

à memória

Page 44: Sem in a Rio Snoopy Protocol

INVALID SHARED

MODIFIED EXCLUSIVE

CPU READOutro bloco Shared coloca

“read miss” no bus

CPU WRITE

Põe “Invalidate”

no bus

CPU

WRI

TEPõ

e W

rite

Mis

s no

Bus

CPU READ HIT

CPU WRITE HITCPU READ HIT

CPU READ HIT

COMPLETO

CPU WRITE HIT

CPU READ

Ninguém em shared,

Põe read miss no bus

Writ

e m

iss

par

a es

se b

loco

Writ

e ba

ck o

blo

co;

Abor

ta o

ace

sso

à m

emor

iaWrite miss ou

“Invalidate” para esse bloco

Write miss ou invalidate para

esse bloco

Read

Mis

s

READ MISS

Write back o bloco

Aborta acesso

à memória

Page 45: Sem in a Rio Snoopy Protocol

Exercício 4.6

Assume the cache contents of Figure 4.37 and the timing of Implementation 1 in Figure 4.38. What are the total stall cycles for the following code sequences with both the base protocol and the new MESI protocol in Exercise 4.5? Assume state transitions that do not require bus transactions incur no additional stall cycles.

Page 46: Sem in a Rio Snoopy Protocol

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

A - <4.2> P0: read 100P0: write 100 <-- 40

Page 47: Sem in a Rio Snoopy Protocol

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

A - <4.2> I - P0: read 100II - P0: write 100 <-- 40

I - Read miss, resolvido pela memória, ninguém em shared MSI: Shared, MESI: Exclusive II - MSI: encaminha “invalidate”, MESI: transição de Exclusive to Modif.

MSI: 100 + 15 = 115 stall cyclesMESI: 100 + 0 = 100 stall cycles

Invalidate (MSI)

Page 48: Sem in a Rio Snoopy Protocol

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

B - <4.2> P0: read 120P0: write 120 <-- 60

Page 49: Sem in a Rio Snoopy Protocol

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

B - <4.2> I - P0: read 120II - P0: write 120 <-- 60

I - Read miss, resolvido pela memória, “compatihados” de ambos ficam Shared II – Ambas encaminham “invalidate “

MSI e MESI: 100 + 15 = 115 stall cycles

Invalidate (MSI)

Page 50: Sem in a Rio Snoopy Protocol

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

C - <4.2> P0: read 100P0: read 120

Page 51: Sem in a Rio Snoopy Protocol

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

C - <4.2> I - P0: read 100II - P0: read 120

I - Read miss, resolvido pela memória, Ninguém em “shared” MSI: S, MESI: E II - Read miss, resolvido pela memoria, silently replace 120 from S or E

Both: 100 + 100 = 200 stall cycles, silent replacement from E

Page 52: Sem in a Rio Snoopy Protocol

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

D - <4.2> P0: read 100P1: write 100 <-- 60

Page 53: Sem in a Rio Snoopy Protocol

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

D - <4.2> I - P0: read 100II - P1: write 100 <-- 60

I - Read miss, resolvido em memória, Ninguém em Shared MSI: S, MESI: E II - Write miss, resolvido em memória

MSI e MESI: 100 + 100 = 200 stall cycles

Page 54: Sem in a Rio Snoopy Protocol

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

E - <4.2> P0: read 100P0: write 100 <-- 60P1: write 100 <-- 40

Page 55: Sem in a Rio Snoopy Protocol

P0

State

Tag Data1

Data0

B0 I 100 00 10

B1 S 108 00 08

B2 M 110 00 30

B3 I 118 00 10

P1

State

Tag Data1

Data0

B0 I 100 00 10

B1 M 128 00 68

B2 I 110 00 10

B3 S 118 00 18

P15

State

TagData1

Data0

B0 S 120 00 20

B1 S 108 00 08

B2 I 110 00 10

B3 I 118 00 10

Address Data1 Data0

100 00 00

108 00 08

110 00 10

118 00 18

120 00 20

128 00 28

130 00 30

...

E - <4.2> I - P0: read 100II - P0: write 100 <-- 60III - P1: write 100 <-- 40

I - Read miss, resolvido em memória, Ninguém em shared MSI: S, MESI: E II - MSI: envia “Invalidate”, MESI: transiciona de Exclus. to Modified III - Write miss, vindo da cache de P0, writeback dados para a memória

MSI: 100 + 15 + 70 + 10 = 195 stall cyclesMESI: 100 + 0 + 70 + 10 = 180 stall cycles

Invalidate