Transcript
Simple, Bus-Based Multiprocessor
Marcus Vinicius Duarte
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
M- ModifiedS – SharedI - Invalid
Arquitetura do Esquema
Exercício 4.1
4.1A. [10] <4.2> P0: read 120
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
4.1A. [10] <4.2> P0: read 120
P0
State
Tag Data1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
P0.B0: (S, 120, 00, 20), Leitura retorna o valor 20
4.1B. [10] <4.2> P0: write 120 <-- 80
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
4.1B. [10] <4.2> P0: write 120 <-- 80
P0
State
Tag Data1
Data0
B0 M 120 00 80
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 I 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
P0.B0: (M, 120, 00, 80), P15.B0: (I, 120, 00, 20)
4.1C. [10] <4.2> P15: write 120 <-- 80
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
4.1C. [10] <4.2> P15: write 120 <-- 80
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 M 120 00 80
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
P15.B0: (M, 120, 00, 80)
4.1D. [10] <4.2> P1: read 110
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
4.1D. [10] <4.2> P1: read 110
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 S 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 S 110 00 30
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 30
118 00 18
120 00 20
128 00 28
130 00 30
...
P0.B2: (S, 110, 00, 30), P1.B2: (S, 110, 00, 30), M[110]: (00, 30), A leitura retorna 30
3 (write back)
2
1
4.1E. [10] <4.2> P0: write 108 <-- 48
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
4.1E. [10] <4.2> P0: write 108 <-- 48
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 108 00 48
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 I 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
P0.B1: (M, 108, 00, 48), P15.B1: (I, 108, 00, 08)
4.1F. [10] <4.2> P0: write 130 <-- 78
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
4.1F. [10] <4.2> P0: write 130 <-- 78
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 130 00 78
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 30
118 00 18
120 00 20
128 00 28
130 00 30
...
P0.B2: (M, 130, 00, 78), M[110]: (00, 30)
Write back
4.1G. [10] <4.2> P15: write 130 <-- 78
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
4.1G. [10] <4.2> P15: write 130 <-- 78
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 M 130 00 78
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
P0.B2: (M, 130, 00, 78)
Exercício 4.2
CONSIDERAÇÕES1- CPU read and write hits generate no stall cycles.
2- CPU read and write misses generate Nmemory and Ncache stall cycles if satisfiedby memory and cache, respectively.
3- CPU write hits that generate an invalidate incur Ninvalidate stall cycles.
4- a writeback of a block, either due to a conflict or another processor’s requestto an exclusive block, incurs an additional Nwriteback stall cycles.
Parâmetro Implementação 1 Implementação 2
Nmemory 100 100
Ncache 70 130
Ninvalidate 15 15
Nwrite back 10 10
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
A - [20] <4.3> I- P0: read 120II - P0: read 128III - P0: read 130
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
A - [20] <4.3> I- P0: read 120II - P0: read 128III - P0: read 130
I - Read miss, resolvido pela memóriaII - Read miss, resolvido pela cache de P1, writeback do end. 110 (shared).III - Read miss, resolvido pela memória, writeback do end. 110
Implementação 1: 100 + 70 + 10 + 100 + 10 = 290 stall cyclesImplementação 2: 100 + 130 + 10 + 100 + 10 = 350 stall cycle
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
B- [20] <4.3> P0: read 100P0: write 108 <-- 48P0: write 130 <-- 78
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
B- [20] <4.3> I - P0: read 100II - P0: write 108 <-- 48IIII - P0: write 130 <-- 78
I - Read miss, resolvido pela memória II - Write hit, encaminha Invalidate III - Write miss, resolvido pela memória, write back 110
Implementation 1: 100 + 15 + 10 + 100 = 225 stall cycles Implementation 2: 100 + 15 + 10 + 100 = 225 stall cycles
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
C- [20] <4.3> P1: read 120P1: read 128P1: read 130
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
C- [20] <4.3> I - P1: read 120II - P1: read 128III -P1: read 130
I - Read miss, resolvido pela memória II -Read hit III - Read miss, resolvido pela memória
Implementação 1: 100 + 0 + 100 = 200 stall cyclesImplementação 2: 100 + 0 + 100 = 200 stall cycles
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
D- [20] <4.3> P1: read 100P1: write 108 <-- 48P1: write 130 <-- 78
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
D- [20] <4.3> I - P1: read 100II - P1: write 108 <-- 48III - P1: write 130 <-- 78
I - Read miss, resolvido pela memória II - Write miss, resolvido pela memória, write back 128 III - Write miss, Resolvido pela memória
Implementação 1: 100 + 100 + 10 + 100 = 310 stall cyclesImplementação 2: 100 + 100 + 10 + 100 = 310 stall cycles
Exercício 4.3
A common protocol optimization is to introduce an Owned state (usually denoted O). The “Owned” state behaves like the Shared state, in that nodes may only read Owned blocks. But it behaves like the Modified state, in that nodes must supply data on other nodes’ read and write misses to Owned blocks. A read miss to a block in either the Modified or Owned states supplies data to the requesting node and transitions to the Owned state. A write miss to a block in either state Modified or Owned supplies data to the requesting node and transitions to state Invalid. This optimized MOSI protocol only updates memory when a node replaces a block in state Modified or Owned.
INVALID SHARED
MODIFIED OWNED
CPU READColoca “read miss” no bus
CPU WRITE
Põe “Invalidate”
no bus
CPU
WRI
TEPõ
e W
rite
Mis
s no
Bus
CPU READ HIT
CPU WRITE HITCPU READ HIT
CPU READ HIT
CPU
CPU WRITEPõe “Invalidate” no bus
INVALID SHARED
MODIFIED OWNED
“Invalidate” para esse bloco
Writ
e m
iss
par
a es
se b
loco
Writ
e ba
ck o
blo
co;
Abor
ta o
ace
sso
à m
emor
ia
READ MISSWriteback o bloco;
Aborta a acesso à memória
BUSWrite miss ou
“Invalidate” para esse bloco
Write miss para esse bloco
Writeback o bloco;
Aborta o acesso à memória
INVALID SHARED
MODIFIED OWNED
CPU READColoca “read miss” no bus
CPU WRITE
Coloca “Invalidate”
no bus
CPU
WRI
TEPõ
e W
rite
Mis
s no
Bus
CPU READ HIT
CPU WRITE HITCPU READ HIT
CPU READ HIT
COMPLETO
CPU WRITEPõe “Invalidate” no bus
“Invalidate” para esse bloco
Writ
e m
iss
par
a es
se b
loco
Writ
e ba
ck o
blo
co;
Abor
ta o
ace
sso
à m
emor
ia
READ MISSWriteback o bloco;
Aborta a acesso à memória
Write miss ou “Invalidate” para esse bloco
Write miss para esse bloco
Writeback para o bloco;
Aborta o acesso à memória
Exercício 4.4
For the following code sequences and the timing parameters for the two implementations in Figure 4.38 (Exercise 4.2), compute the total stall cycles for the base MSI protocol and the optimized MOSI protocol in Exercise 4.3. Assume state transitions that do not require bus transactions incur no additional stall cycles.
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
A - [20] <4.2> P1: read 110P15: read 110P0: read 110
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
A - [20] <4.2> I- P1: read 110II - P15: read 110III - P0: read 110
I - Read miss, Pega da cache de P0II - Read miss, MSI resolvido pela memória, MOSI Resolvido pela cache de P0 III - Read hit
MSI: 70 + 10 + 100 + 0 = 180 stall cyclesMOSI: 70 + 10 + 70 + 10 + 0 = 160 stall cycles
MSI
MOSI
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
B - [20] <4.2> P1: read 120P15: read 120P0: read 120
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
B - [20] <4.2> I- P1: read 120II - P15: read 120III - P0: read 120
I - Read miss, resolvido pela memória II - Read hit III - Read miss, resolvido pela memória
MSI e MOSI: 100 + 0 + 100 = 200 stall cycles
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
C - [20] <4.2> P0: write 120 <-- 80P15: read 120P0: read 120
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
C - [20] <4.2> I - P0: write 120 <-- 80II - P15: read 120III - P0: read 120
I - Write miss, invalida P15 II - Read miss, pega da cache de P0 III - Read hit MSI e MOSI: 100 + 70 + 10 + 0 = 180 stall cycles
Read miss
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
D - [20] <4.2> P0: write 108 <-- 88P15: read 108P0: write 108 <-- 98
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
D - [20] <4.2> I - P0: write 108 <-- 88II - P15: read 108III - P0: write 108 <-- 98
I –Encaminha “invalidate”, Invalida 108 de P15 II - Read miss, usa a cache de P0 III – Encaminha “invalidate”, Invalida 108 em P15
MSI e MOSI: 15 + 70 + 10 + 15 = 110 stall cycle
Invalidate
Exercício 4.5
Some applications read a large data set first, then modify most or all of it. The base MSI coherence protocol will first fetch all of the cache blocks in the Shared state, and then be forced to perform an invalidate operation to upgrade them to the Modified state. The additional delay has a significant impact on some workloads. An additional protocol optimization eliminates the need to upgrade blocks that are read and later written by a single processor. This optimization adds the Exclusive (E) state to the protocol, indicating that no other node has a copy of the block, but it has not yet been modified. A cache block enters the Exclusive state when a read miss is satisfied by memory and no other node has a valid copy. CPU reads and writes to that block proceed with no further bus traffic, but CPU writes cause the coherence state to transition to Modified. Exclusive differs from Modified because the node may silently replace Exclusive blocks (while Modified blocks must be written back to memory). Also, a read miss to an Exclusive block results in a transition to Shared, but does not require the node to respond with data (since memory has an up-to-date copy).Draw new protocol diagrams for a MESI protocol that adds the Exclusive state and transitions to the base MSI protocol’s Modified, Shared, and Invalidate states.
INVALID SHARED
MODIFIED EXCLUSIVE
CPU READOutro bloco Shared coloca
“read miss” no bus
CPU WRITE
Põe “Invalidate”
no bus
CPU
WRI
TEPõ
e W
rite
Mis
s no
Bus
CPU READ HIT
CPU WRITE HITCPU READ HIT
CPU READ HIT
CPU
CPU WRITE HIT
CPU READ
Ninguém em shared,
Põe read miss no bus
INVALID SHARED
MODIFIED EXCLUSIVE
Writ
e m
iss
par
a es
se b
loco
Writ
e ba
ck o
blo
co;
Abor
ta o
ace
sso
à m
emor
ia
BUS
Write miss ou “Invalidate” para esse bloco
Write miss ou invalidate para
esse bloco
Read
Mis
s
READ MISS
Write back o bloco
Aborta acesso
à memória
INVALID SHARED
MODIFIED EXCLUSIVE
CPU READOutro bloco Shared coloca
“read miss” no bus
CPU WRITE
Põe “Invalidate”
no bus
CPU
WRI
TEPõ
e W
rite
Mis
s no
Bus
CPU READ HIT
CPU WRITE HITCPU READ HIT
CPU READ HIT
COMPLETO
CPU WRITE HIT
CPU READ
Ninguém em shared,
Põe read miss no bus
Writ
e m
iss
par
a es
se b
loco
Writ
e ba
ck o
blo
co;
Abor
ta o
ace
sso
à m
emor
iaWrite miss ou
“Invalidate” para esse bloco
Write miss ou invalidate para
esse bloco
Read
Mis
s
READ MISS
Write back o bloco
Aborta acesso
à memória
Exercício 4.6
Assume the cache contents of Figure 4.37 and the timing of Implementation 1 in Figure 4.38. What are the total stall cycles for the following code sequences with both the base protocol and the new MESI protocol in Exercise 4.5? Assume state transitions that do not require bus transactions incur no additional stall cycles.
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
A - <4.2> P0: read 100P0: write 100 <-- 40
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
A - <4.2> I - P0: read 100II - P0: write 100 <-- 40
I - Read miss, resolvido pela memória, ninguém em shared MSI: Shared, MESI: Exclusive II - MSI: encaminha “invalidate”, MESI: transição de Exclusive to Modif.
MSI: 100 + 15 = 115 stall cyclesMESI: 100 + 0 = 100 stall cycles
Invalidate (MSI)
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
B - <4.2> P0: read 120P0: write 120 <-- 60
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
B - <4.2> I - P0: read 120II - P0: write 120 <-- 60
I - Read miss, resolvido pela memória, “compatihados” de ambos ficam Shared II – Ambas encaminham “invalidate “
MSI e MESI: 100 + 15 = 115 stall cycles
Invalidate (MSI)
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
C - <4.2> P0: read 100P0: read 120
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
C - <4.2> I - P0: read 100II - P0: read 120
I - Read miss, resolvido pela memória, Ninguém em “shared” MSI: S, MESI: E II - Read miss, resolvido pela memoria, silently replace 120 from S or E
Both: 100 + 100 = 200 stall cycles, silent replacement from E
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
D - <4.2> P0: read 100P1: write 100 <-- 60
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
D - <4.2> I - P0: read 100II - P1: write 100 <-- 60
I - Read miss, resolvido em memória, Ninguém em Shared MSI: S, MESI: E II - Write miss, resolvido em memória
MSI e MESI: 100 + 100 = 200 stall cycles
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
E - <4.2> P0: read 100P0: write 100 <-- 60P1: write 100 <-- 40
P0
State
Tag Data1
Data0
B0 I 100 00 10
B1 S 108 00 08
B2 M 110 00 30
B3 I 118 00 10
P1
State
Tag Data1
Data0
B0 I 100 00 10
B1 M 128 00 68
B2 I 110 00 10
B3 S 118 00 18
P15
State
TagData1
Data0
B0 S 120 00 20
B1 S 108 00 08
B2 I 110 00 10
B3 I 118 00 10
Address Data1 Data0
100 00 00
108 00 08
110 00 10
118 00 18
120 00 20
128 00 28
130 00 30
...
E - <4.2> I - P0: read 100II - P0: write 100 <-- 60III - P1: write 100 <-- 40
I - Read miss, resolvido em memória, Ninguém em shared MSI: S, MESI: E II - MSI: envia “Invalidate”, MESI: transiciona de Exclus. to Modified III - Write miss, vindo da cache de P0, writeback dados para a memória
MSI: 100 + 15 + 70 + 10 = 195 stall cyclesMESI: 100 + 0 + 70 + 10 = 180 stall cycles
Invalidate
top related