Top Banner
March 24 2005 University of Utah CS 7698 Token Coherence: Decoupling Performance and Correctness Article by: Martin, Hill & Wood Presented by: Michael Tabet CS 7698
24

Token Coherence: Decoupling Performance and Correctness

Feb 22, 2016

Download

Documents

malina

Token Coherence: Decoupling Performance and Correctness. Article by: Martin, Hill & Wood Presented by: Michael Tabet CS 7698. A Tale of Two Methods. Snooping based Uses totally ordered broadcasts to preserve correctness Uses lots of bandwidth Big (large busses) = BAD! Directory based - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Token Coherence:  Decoupling Performance and Correctness

March 24 2005 University of Utah CS 7698

Token Coherence: Decoupling Performance and Correctness

Article by: Martin, Hill & WoodPresented by: Michael Tabet

CS 7698

Page 2: Token Coherence:  Decoupling Performance and Correctness

March 24 2005 CS 7698

A Tale of Two Methods Snooping based

Uses totally ordered broadcasts to preserve correctness

Uses lots of bandwidth Big (large busses) = BAD!

Directory based Uses indirection to preserve bandwidth Indirection adds latency Needs a directory controller

Page 3: Token Coherence:  Decoupling Performance and Correctness

March 24 2005 CS 7698

Potential work aroundsSnooping Snooping is fast, but requires a bus. Big

fast busses are complex -> Use a virtual bus to virtual broadcast!

Directory Networks require lots of logic (especially

big ones) -> Use glueless networks!

Page 4: Token Coherence:  Decoupling Performance and Correctness

March 24 2005 CS 7698

Token CoherenceProvides for both indirection, and speed

up through unordered broadcasts

Two components: Correctness substrate Performance protocol

Page 5: Token Coherence:  Decoupling Performance and Correctness

March 24 2005 CS 7698

CorrectnessSpeed is Good, Correctness is Better!

Need to guarantee ordered reads/writes!

Thus, use a correctness “substrate”

Page 6: Token Coherence:  Decoupling Performance and Correctness

March 24 2005 CS 7698

Correctness Invariants1. At all times, each block has T tokens2. A processor can only write a block if

it holds all T tokens3. A processor can read a block only if it

holds at least one token4. If a coherence message contains one

or more tokens, it must contain data

Page 7: Token Coherence:  Decoupling Performance and Correctness

March 24 2005 CS 7698

Invariant 1 ImplicationsAllows for precise control of blocks of data.

Page 8: Token Coherence:  Decoupling Performance and Correctness

March 24 2005 CS 7698

Invariant 2 ImplicationsEnables write control mechanism to allow in order writes

Page 9: Token Coherence:  Decoupling Performance and Correctness

March 24 2005 CS 7698

Invariant 3 ImplicationsRestricts reads

Page 10: Token Coherence:  Decoupling Performance and Correctness

March 24 2005 CS 7698

Invariant 4 ImplicationsProvides a method to ensure cache coherence

Page 11: Token Coherence:  Decoupling Performance and Correctness

March 24 2005 CS 7698

StarvationInvariants allow of ordered reads/writes, but

how do we prevent starvation?

Persistent requests:1. A processor times out on transient requests2. Raises a persistent request (only one per block)3. All nodes must forward blocks to the node

But repeated & persistent requests only make up 1-3% of the messages

Page 12: Token Coherence:  Decoupling Performance and Correctness

March 24 2005 CS 7698

Persistent Request State Diagram

Page 13: Token Coherence:  Decoupling Performance and Correctness

March 24 2005 CS 7698

Performance protocolBut if you always follow the rules, it can

get slow and tedious!

Tokens allow for unordered responses to requests. This opens the door for all sorts of optimizations

Page 14: Token Coherence:  Decoupling Performance and Correctness

March 24 2005 CS 7698

TokenBA New Contender

Akin to MSI snooping protocol: Requests broadcast Data exists either in

Modified (All tokens) Shared (Some tokens) Invalid (No tokens)

But: Performance protocol allows for better performance!

Page 15: Token Coherence:  Decoupling Performance and Correctness

March 24 2005 CS 7698

TokenB: Optimized Token CountingMSI was a bit of a lie, can optimize token

counting by altering invariants 1,3,4:

1. At all times, each block has T tokens, one of which is the owner token

3. A processor can read a block only if it holds at least one token for that block and has valid data

4. If a coherence message contains the owner token, it must contain data

Page 16: Token Coherence:  Decoupling Performance and Correctness

March 24 2005 CS 7698

TokenB ContinuedThe Good Stuff

Performance in: Tokens allow replies to be sent

unordered, and indirectly (no broadcast)

This means: 15-28% faster than snooping 17-54% faster than directory 21-25% less bandwidth than snooping

Page 17: Token Coherence:  Decoupling Performance and Correctness

March 24 2005 CS 7698

An ExampleP1 reads then P2 writes then P1 reads

Presume a 4 node systems, where P1 has an invalid copy, P2 has a shared copy, and P3 is the “home/owner” node

Page 18: Token Coherence:  Decoupling Performance and Correctness

March 24 2005 CS 7698

ExampleThe Snooping Way

P1 P2 P3 P4

1 2 3 4 5

All messages broadcast!

Page 19: Token Coherence:  Decoupling Performance and Correctness

March 24 2005 CS 7698

ExampleThe Directory Way

P1 P2 P3 P4

Directory

1 3 2 4 4 44 5 6

Directory process messages 13 4 5!

Page 20: Token Coherence:  Decoupling Performance and Correctness

March 24 2005 CS 7698

ExampleThe Token Way

P1 P2 P3 P4

1(broadcast) 2 3(broadcast) 4 4 45(broadcast) 6

Page 21: Token Coherence:  Decoupling Performance and Correctness

March 24 2005 CS 7698

Real world resultsExamined on a tree structure (virtual

broadcast), and on a 2d torus

Migratory optimization: a read request after a write is forwarded all tokens

Benchmarked on OLTP, SPECjbb, Apache

Page 22: Token Coherence:  Decoupling Performance and Correctness

March 24 2005 CS 7698

ResultsToken vs Snooping: TOKEN Wins!

Page 23: Token Coherence:  Decoupling Performance and Correctness

March 24 2005 CS 7698

ResultsDirectory vs Token: Token mostly wins!

Page 24: Token Coherence:  Decoupling Performance and Correctness

March 24 2005 CS 7698

ConclusionTokenB offers a good performance for

small-middle sized parallel systems

Broadcasts limits scalability past 16 nodes

But other performance implementations could be scaled larger!