Top Banner
(C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple- CMP Systems Mike Marty 1 , Jesse Bingham 2 , Mark Hill 1 , Alan Hu 2 , Milo Martin 3 , and David Wood 1 1 University of Wisconsin-Madison 2 University of British Columbia 3 University of Pennsylvania February 17 th , 2005
46

(C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Jan 11, 2016

Download

Documents

Reginald Watts
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

(C) 2005 Multifacet Project

Token Coherence: A Framework for Implementing

Multiple-CMP Systems

Mike Marty1, Jesse Bingham2, Mark Hill1, Alan Hu2, Milo Martin3, and David Wood1

1University of Wisconsin-Madison2University of British Columbia

3University of Pennsylvania

February 17th, 2005

Page 2: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 2 Improving Multiple-CMP Systems using Token Coherence

Summary

• Microprocessor Chip Multiprocessor (CMP)• Symmetric Multiprocessor (SMP) Multiple CMPs

• Problem: Coherence with Multiple CMPs

• Old Solution: Hierarchical Directory Complex & Slow

• New Solution: Apply Token Coherence– Developed for glueless multiprocessor [2003]– Keep: Flat for Correctness– Exploit: Hierarchical for performance

• Less Complex & Faster than Hierarchical Directory

Page 3: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 3 Improving Multiple-CMP Systems using Token Coherence

Outline

• Motivation and Background– Coherence in Multiple-CMP Systems– Example: DirectoryCMP

• Token Coherence: Flat for Correctness

• Token Coherence: Hierarchical for Performance

• Evaluation

Page 4: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 4 Improving Multiple-CMP Systems using Token Coherence

Coherence in Multiple-CMP Systems

CMP 3 CMP 4

CMP 2CMP 1

interconnect

I D I D I D I D

P P P P

L2 L2 L2 L2

• Chip Multiprocessors (CMPs) emerging• Larger systems will be built with Multiple CMPs

interconnect

Page 5: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 5 Improving Multiple-CMP Systems using Token Coherence

Problem: Hierarchical Coherence

Inter-CMP Coherence

Intra-CMP Coherence

• Intra-CMP protocol for coherence within CMP• Inter-CMP protocol for coherence between CMPs• Interactions between protocols increase complexity

– explodes state space

CMP 3 CMP 4

CMP 2CMP 1

interconnect

Page 6: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 6 Improving Multiple-CMP Systems using Token Coherence

Improving Multiple CMP Systems with Token Coherence

• Token Coherence allows Multiple-CMP systems to be...– Flat for correctness, but– Hierarchical for performance

Correctness Substrate

PerformanceProtocol

Low Complexity

Fast

interconnect

CMP 3 CMP 4

CMP 2CMP 1

Page 7: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 7 Improving Multiple-CMP Systems using Token Coherence

Memory/Directory

Example: DirectoryCMP

CMP 0

P0

Store B

CMP 1

L1 I&D

Shared L2 / directory

P1L1 I&D

P2L1 I&D

P3L1 I&D

P4L1 I&D

P5L1 I&D

P6L1 I&D

P7L1 I&D

getx

getx

fwd

fwd invinvinv

Shared L2 / directory

ackack ackdata/ack

data/ack

data/ack

S

O SSS

2-level MOESI Directory

getxWB

getx

WB

RACE CONDITIONS!

Store B

Memory/Directory

B: [S O] B: [M I]

Page 8: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 8 Improving Multiple-CMP Systems using Token Coherence

Token Coherence Summary

• Token Coherence separates performance from correctness

• Correctness Substrate: Enforces coherence invariant and prevents starvation1. Safety with Token Counting

2. Starvation Avoidance with Persistent Requests

• Performance Policy: Makes the common case fast– Transient requests to seek tokens

• Unordered, untracked, unacknowledged

– Possible prediction, multicast, filters, etc

Page 9: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 9 Improving Multiple-CMP Systems using Token Coherence

Outline

• Motivation and Background

• Token Coherence: Flat for Correctness– Safety– Starvation Avoidance

• Token Coherence: Hierarchical for Performance

• Evaluation

Page 10: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 10 Improving Multiple-CMP Systems using Token Coherence

Store BLoad B

Example: Token Coherence [ISCA 2003]

Load B

• Each memory block initialized with T tokens• Tokens stored in memory, caches, & messages• At least one token to read a block• All tokens to write a block

P0L1 I&D

L2

P1L1 I&D

L2

P2L1 I&D

L2

P3L1 I&D

L2

interconnect

Store B

mem 0 mem 3

Page 11: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 11 Improving Multiple-CMP Systems using Token Coherence

Extending to Multiple-CMP System

P0L1 I&D

L2

P1L1 I&D

L2

P2L1 I&D

L2

P3L1 I&D

L2

interconnectmem 0 mem 1

CMP 0

interconnect

Shared L2

CMP 1

interconnect

Shared L2

Page 12: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 12 Improving Multiple-CMP Systems using Token Coherence

mem 0

Extending to Multiple-CMP SystemCMP 0

interconnect

P0

interconnect

P1

mem 1

CMP 1

interconnect

P2 P3

• Token counting remains flat• Tokens to caches

– Handles shared caches and other complex hierarchies

Shared L2 Shared L2

L1 I&D L1 I&D L1 I&D L1 I&D

Store BStore B

Page 13: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 13 Improving Multiple-CMP Systems using Token Coherence

Safety Recap

• Safety: Maintain coherence invariant– Only one writer, or multiple readers

• Tokens for Safety– T Tokens associated with each memory block

– # tokens encoded in 1+log2T

– Processor acquires all tokens to write, a single token to read

• Tokens passed to nodes in glueless multiprocessor scheme– But CMPs have private and shared caches

• Tokens passed to caches in Multiple-CMP system– Arbitrary cache hierarchy easily handled

– Flat for correctness

Page 14: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 14 Improving Multiple-CMP Systems using Token Coherence

Some Token Counting Implications

• Memory must store tokens– Separate RAM

– Use extra ECC bits

– Token cache

• T sized to # caches to allow read-only copies in all caches

• Replacements cannot be silent– Tokens must not be lost or dropped

• Targeted for invalidate-based protocols– Not a solution for write-through or update protocols

• Tokens must be identified by block address– Address must be in all token-carrying messages

Page 15: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 15 Improving Multiple-CMP Systems using Token Coherence

Starvation Avoidance

• Request messages can miss tokens– In-flight tokens

• Transient Requests are not tracked throughout system

– Incorrect filtering, multicast, destination-set prediction, etc

• Possible Solution: Retries– Retry w/ optional randomized backoff is effective for races

• Guaranteed Solution: Persistent Requests– Heavyweight request guaranteed to succeed– Should be rare (uses more bandwidth)– Locates all tokens in the system– Orders competing requests

Page 16: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 16 Improving Multiple-CMP Systems using Token Coherence

mem 0

Starvation AvoidanceCMP 0

interconnect

P0Store B

interconnect

P1

mem 1

CMP 1

interconnect

P2Store B

P3

• Tokens move freely in the system– Transient requests can miss in-flight tokens– Incorrect speculation, filters, prediction, etc

Shared L2 Shared L2

Store B

GETXGETX GETX

L1 I&D L1 I&D L1 I&D L1 I&D

Page 17: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 17 Improving Multiple-CMP Systems using Token Coherence

mem 0

Starvation AvoidanceCMP 0

interconnect

P0

interconnect

P1

mem 1

CMP 1

interconnect

P2 P3

Shared L2 Shared L2

L1 I&D L1 I&D L1 I&D L1 I&D

• Solution: issue Persistent Request– Heavyweight request guaranteed to succeed– Methods: Centralized [2003] and Distributed (New)

Store B Store BStore B

Page 18: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 18 Improving Multiple-CMP Systems using Token Coherence

mem 0

Old Scheme: Central Arbiter [2003]CMP 0

interconnect

P0Store B

interconnect

P1

mem 1

CMP 1

interconnect

P2Store B

P3

– Processors issue persistent requests

Shared L2 Shared L2

Store B

L1 I&D L1 I&D L1 I&D L1 I&D

arbiter 0

arbiter 0B: P0B: P2B: P1

timeout timeout timeout

Page 19: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 19 Improving Multiple-CMP Systems using Token Coherence

mem 0

Old Scheme: Central Arbiter [2003]CMP 0

interconnect

P0Store B

interconnect

P1

mem 1

CMP 1

interconnect

P2Store B

P3

– Processors issue persistent requests– Arbiter orders and broadcasts activate

Shared L2 Shared L2

Store B

L1 I&D L1 I&D L1 I&D L1 I&D

arbiter 0

arbiter 0B: P0B: P2B: P1

B: P0

B: P0 B: P0 B: P0 B: P0

B: P0

Store B

Page 20: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 20 Improving Multiple-CMP Systems using Token Coherence

mem 0

Old Scheme: Central Arbiter [2003]CMP 0

interconnect

P0

interconnect

P1

mem 1

CMP 1

interconnect

P2Store B

P3

– Processor sends deactivate to arbiter– Arbiter broadcasts deactivate (and next activate)– Bottom Line: handoff is 3 message latencies

Shared L2 Shared L2

Store B

L1 I&D L1 I&D L1 I&D L1 I&D

arbiter 0

arbiter 0

B: P2B: P1

B: P0

B: P0 B: P0 B: P0 B: P0

B: P0

B: P2

B: P2

B: P2 B: P2

B: P2

B: P2B: P2

Store B

B: P0

1 2

3

Page 21: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 21 Improving Multiple-CMP Systems using Token Coherence

mem 0

Improved Scheme: Distributed Arbitration [NEW]

CMP 0

interconnect

P0Store B

interconnect

P1: BP2: B

P0: B

P1: BP2: B

P0: B P1P1: BP2: B

P0: B

mem 1

CMP 1

interconnect

P2Store B

P1: BP2: B

P0: B

P1: BP2: B

P0: B P3P1: BP2: B

P0: B

P1: BP2: B

P0: B

– Processors broadcast persistent requests

Shared L2 Shared L2

Store B

L1 I&D L1 I&D L1 I&D L1 I&D

Page 22: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 22 Improving Multiple-CMP Systems using Token Coherence

mem 0

Improved Scheme: Distributed Arbitration [NEW]

CMP 0

interconnect

P0Store B

interconnect

P1: BP2: B

P0: B

P1: BP2: B

P0: B P1P1: BP2: B

P0: B

mem 1

CMP 1

interconnect

P2Store B

P1: BP2: B

P0: B

P1: BP2: B

P0: B P3P1: BP2: B

P0: B

P1: BP2: B

P0: B

– Processors broadcast persistent requests– Fixed priority (processor number)

Store B

P0: B P0: B

P0: B

P0: B

P0: B P0: B

P0: BShared L2Shared L2

L1 I&D L1 I&D L1 I&D L1 I&D

Page 23: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 23 Improving Multiple-CMP Systems using Token Coherence

mem 0

Improved Scheme: Distributed Arbitration [NEW]

CMP 0

interconnect

P0

interconnect

P1: BP2: B

P0: B

P1: BP2: B

P0: B P1P1: BP2: B

P0: B

mem 1

CMP 1

interconnect

P2Store B

P1: BP2: B

P0: B

P1: BP2: B

P0: B P3P1: BP2: B

P0: B

P1: BP2: B

P0: B

Shared L2 Shared L2

Store B

– Processors broadcast persistent requests– Fixed priority (processor number)– Processors broadcast deactivate

P1: B P1: B P1: B P1: B

P1: B

P1: B P1: B

L1 I&D L1 I&D L1 I&D L1 I&D1

Page 24: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 24 Improving Multiple-CMP Systems using Token Coherence

mem 0

Improved Scheme: Distributed Arbitration [NEW]

CMP 0

interconnect

P0

interconnect

P1: BP2: B

P1: BP2: B

P1P1: BP2: B

mem 1

CMP 1

interconnect

P2

P1: BP2: B

P1: BP2: B

P3P1: BP2: B

P1: BP2: B

Shared L2 Shared L2

– Bottom line: Handoff is a single message latency• Subtle point: P0 and P1 must wait until next “wave”

P1: B P1: B P1: B P1: B

P1: B

P1: B P1: B

L1 I&D L1 I&D L1 I&D L1 I&D

Page 25: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 25 Improving Multiple-CMP Systems using Token Coherence

Implementing Distributed Persistent Requests

• Table at each cache– Sized to N entries for each processor (we use N=1)– Indexed by processor ID– Content-addressable by Address

• Each incoming message must access table– Not on the critical path– can be slow CAM

• Activate/deactivate reordering cannot be allowed– Persistent request virtual channel must be point-to-point

ordered– Or, other solution such as sequence numbers or acks

Page 26: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 26 Improving Multiple-CMP Systems using Token Coherence

Implementing Distributed Persistent Requests

• Should reads be distinguished from writes?– Not necessary, but– Persistent Read request is helpful

• Implications of flat distributed arbitration– Simple flat for correctness– Global broadcast when used

• Fortunately they are rare in typical workloads (0.3%)• Bad workload (very high contention) would burn bandwidth

– Maximum # processors must be architected

• What about a hierarchical persistent request scheme?– Possible, but correctness is no longer flat– Make the common case fast

Page 27: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 27 Improving Multiple-CMP Systems using Token Coherence

Reducing Unnecessary Traffic

• Problem: Which token-holding cache responds with data?

• Solution: Distinguish one token as the owner token

– The owner includes data with token response

– Clean vs. dirty owner distinction also useful for writebacks

Page 28: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 28 Improving Multiple-CMP Systems using Token Coherence

Outline

• Motivation and Background

• Token Coherence: Flat for Correctness

• Token Coherence: Hierarchical for Performance– TokenCMP– Another look at performance policies

• Evaluation

Page 29: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 29 Improving Multiple-CMP Systems using Token Coherence

Hierarchical for Performance: TokenCMP

• Target System:– 2-8 CMPs– Private L1s, shared L2 per CMP– Any interconnect, but high-bandwidth

• Performance Policy Goals: – Aggressively acquire tokens– Exploit on-chip locality and bandwidth– Respect cache hierarchy– Detecting and handling missed tokens

Page 30: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 30 Improving Multiple-CMP Systems using Token Coherence

Hierarchical for Performance: TokenCMP

• Approach:– On L1 miss, broadcast within own CMP

• Local cache responds if possible

– On L2 miss, broadcast to other CMPs– Appropriate L2 bank responds or broadcasts within its CMP

• Optionally filter

– Responses between CMPs carry extra tokensfor future locality

• Handling missed tokens:– Timeout after average memory latency – Invoke persistent request (no retries)

• Larger systems can use filters, multicast, soft-state directories

Page 31: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 31 Improving Multiple-CMP Systems using Token Coherence

Other Optimizations in TokenCMP

• Implementing E-state– Memory responds with all tokens on read request– Use clean/dirty owner distinction to eliminate writing back

unwritten data

• Implementing Migratory Sharing– What is it?

• A processor’s read request results in exclusive permission if responder has exclusive permission and wrote the block

– In TokenCMP, simply return all tokens

• Non-speculative delay– Hold block for some # cycles so permission isn’t stolen

prematurely

Page 32: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 32 Improving Multiple-CMP Systems using Token Coherence

Another Look at Performance Policies

• How to find tokens?– Broadcast– Broadcast w/ filters– Multicast (destination-set prediction)– Directories (soft or hard)

• Who responds with data?– Owner token

• TokenCMP uses Owner token for Inter-CMP responses

– Other heuristics• For TokenCMP intra-CMP responses, cache responds if it has

extra tokens

Page 33: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 33 Improving Multiple-CMP Systems using Token Coherence

Transient Requests May Reduce Complexity

• Processor holds the only required state about request

• L2 controller in TokenCMP very simple:– Re-broadcasts L1 request message on a miss– Re-broadcasts or filters external request messages– Possible states:

• no tokens (I)• all tokens (M) • some tokens (S)

– Bounce unexpected tokens to memory

• DirectoryCMP’s L2 controller is complex– Allocates MSHR on miss and forward– Issues invalidates and receives acks– Orders all intra-CMP requests and writebacks– 57 states in our L2 implementation!

Page 34: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 34 Improving Multiple-CMP Systems using Token Coherence

Writebacks

• DirectoryCMP uses “3-phase writebacks”– L1 issues writeback request– L2 enters transient state or blocks request– L2 responds with writeback ack– L1 sends data

• TokenCMP uses “fire-and-forget” writebacks– Immediately send tokens and data– Heuristic: Only send data if # tokens > 1

Page 35: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 35 Improving Multiple-CMP Systems using Token Coherence

Outline

• Motivation and Background

• Token Coherence: Flat for Correctness

• Token Coherence: Hierarchical for Performance

• Evaluation– Model checking– Performance w/ commercial workloads– Robustness

Page 36: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 36 Improving Multiple-CMP Systems using Token Coherence

TokenCMP Evaluation

• Simple?– Some anecdotal examples and comparisons– Model checking

• Fast?– Full-system simulation w/ commercial workloads

• Robust?– Micro-benchmarks to simulate high contention

Page 37: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 37 Improving Multiple-CMP Systems using Token Coherence

Complexity Evaluation with Model Checking

This work performed by Jesse Bingham and Alan Hu of the University of British Columbia

• Methods:– TLA+ and TLC

– DirectoryCMP omits all intra-CMP details

– TokenCMP’s correctness substrate modeled

• Result:– Complexity similar between TokenCMP and non-hierarchical

DirectoryCMP

– Correctness Substrate verified to be correct and deadlock-free

– All possible performance protocols correct

Page 38: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 38 Improving Multiple-CMP Systems using Token Coherence

Performance Evaluation

• Target System:– 4 CMPs, 4 procs/cmp– 2GHz OoO SPARC, 8MB shared L2 per chip– Directly connected interconnect

• Methods: Multifacet GEMS simulator– Simics augmented with timing models– Released soon: http://www.cs.wisc.edu/gems

• Benchmarks:– Performance: Apache, Spec, OLTP– Robustness: Locking uBenchmark

Page 39: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 39 Improving Multiple-CMP Systems using Token Coherence

Full-system Simulation: Runtime

– TokenCMP performs 9-50% faster than DirectoryCMP

Page 40: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 40 Improving Multiple-CMP Systems using Token Coherence

Full-system Simulation: Runtime

– TokenCMP performs 9-50% faster than DirectoryCMP

DRAM Directory

Perfect L2

Page 41: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 41 Improving Multiple-CMP Systems using Token Coherence

Full-system Simulation: Inter-CMP Traffic

– TokenCMP traffic is reasonable (or better)

• DirectoryCMP control overhead greater than broadcast for small system

Page 42: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 42 Improving Multiple-CMP Systems using Token Coherence

Full-system Simulation: Intra-CMP Traffic

Page 43: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 43 Improving Multiple-CMP Systems using Token Coherence

Performance Robustness

Locking micro-benchmark

less contentionmore contention

(correctness substrate only)

Page 44: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 44 Improving Multiple-CMP Systems using Token Coherence

Performance Robustness

Locking micro-benchmark

less contentionmore contention

(correctness substrate only)

Page 45: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 45 Improving Multiple-CMP Systems using Token Coherence

Performance Robustness

Locking micro-benchmark

less contentionmore contention

Page 46: (C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.

Slide 46 Improving Multiple-CMP Systems using Token Coherence

Summary

• Microprocessor Chip Multiprocessor (CMP)• Symmetric Multiprocessor (SMP) Multiple CMPs

• Problem: Coherence with Multiple CMPs

• Old Solution: Hierarchical Directory Complex & Slow

• New Solution: Apply Token Coherence– Developed for glueless multiprocessor [2003]– Keep: Flat for Correctness– Exploit: Hierarchical for performance

• Less Complex & Faster than Hierarchical Directory