Top Banner
Cache coherence and consistency models in multiprocessor architecture Computer Architecture Authors: Piscione Pietro Villardita Alessio Degree: Computer Engineering A.Y. 2014/2015
34

Coherence and consistency models in multiprocessor architecture

Aug 08, 2015

Download

Engineering

Pietro Piscione
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Coherence and consistency models in multiprocessor architecture

Cache coherence and consistency models in multiprocessor architecture

Computer Architecture Authors:Piscione Pietro

Villardita AlessioDegree: Computer EngineeringA.Y. 2014/2015

Page 2: Coherence and consistency models in multiprocessor architecture

Introduction● Multiprocessor architecture

overview

● Coherence vs. Consistency○ Coherence protocols○ Snooping and Directory models

○ Consistency models○ Sequential Consistency

Page 3: Coherence and consistency models in multiprocessor architecture

● More throughput● More efficiency

Why multiprocessor architecture? ● Clock frequency wall

● Shared memory

● Distributed memory

The bus is the bottleneck

Page 4: Coherence and consistency models in multiprocessor architecture

More processors (16 Threads)

More cache memory(20 MB L3)

More complexity(1.86 billions transistors)

i7-990x (2011): 12 threads, 12 MB cache, 1.17 billions trans.

Page 5: Coherence and consistency models in multiprocessor architecture

Processor-Memory Performance gap

1.25x1.52x

1.20x

Page 6: Coherence and consistency models in multiprocessor architecture

Cache design factorsTraditionally, memory hierarchies designers focused on:● Optimizing average memory access time● Miss rate● Miss penalty

More recently:● power consumption has become a major

consideration

Page 7: Coherence and consistency models in multiprocessor architecture

Miss rate vs cache size (SPEC CPU2000)

Conflict

Compulsory

Capacity

Page 8: Coherence and consistency models in multiprocessor architecture

Cache (in)CoherenceIncoherence

occurs

PiPj

Page 9: Coherence and consistency models in multiprocessor architecture

They are different

Consistency and coherence

● Cache coherence model specifies HOW memory accesses are coordinated among CPUs.

● Cache consistency model specifies WHEN a memory write shows up at another CPU.

Page 10: Coherence and consistency models in multiprocessor architecture

“For any given memory location, at any given (logical) time, there is either a single core that may write it (and that may also read it) or some number of cores that may read it.”

Cache coherence: definition

Two fundamental invariants:● Single-Writer-Multiple-Reader (SWMR)● Data-Value

Page 11: Coherence and consistency models in multiprocessor architecture

Cache coherence: epochs

● Dividing a given memory location’s lifetime into epochs

● SWMR only is not enough: need for the Data-Value invariant

Page 12: Coherence and consistency models in multiprocessor architecture

Coherence Controller

Page 13: Coherence and consistency models in multiprocessor architecture

● Accepts loads and stores from and returns load values to the core

● Initiate a coherence transaction when a cache miss occurs, by issuing a coherence request for the block requested by the core

● Receive coherence requests and coherence responses that must be processed

Coherence controller behavior

Page 14: Coherence and consistency models in multiprocessor architecture

Coherence Protocols: basicsWhen a write occurs on a specific address, what’s next? Two alternatives:● Write invalidate (most common): invalidate all

other copies

● Write update (broadcast): update all the cached copies

Page 15: Coherence and consistency models in multiprocessor architecture

Invalidate vs. Update protocolsInvalidate:● One message to

achieve coherence

● Significantly less bandwidth

● Easy to implement

Update:● Less read latency

● Larger messages

● More bandwidth

● More complex implementations

Page 16: Coherence and consistency models in multiprocessor architecture

Coherence Protocols: basics● Directory based: physical memory blocks’

sharing status stored in one centralized location

● Snooping: every cache tracks the sharing status of the given block of physical memory

Page 17: Coherence and consistency models in multiprocessor architecture

Snooping protocol: main features● Distributed architecture● Messages broadcasting● Not so scalable● Total order of coherence requests across all

blocks● Interconnection network must serialize these

requests into some total order

Page 18: Coherence and consistency models in multiprocessor architecture

● Write to shared data:○ An invalidate is sent to all caches which snoop and

invalidate any copy

Snooping protocol: Write Invalidate

● Read Miss:○ Write-through: memory is always up-to-date○ Write-back: force other caches to update copy in main

memory, then snoop that value

Can use a separate invalidate bus for write traffic

Page 19: Coherence and consistency models in multiprocessor architecture

● Write to shared data:○ Broadcast on bus, processors snoop, and update

copies

Snooping protocol: Write Update

● Read miss:○ memory is always up-to-date

● Higher bandwidth (transmit data + address), but lower latency for readers (looks like write-through cache)

Page 20: Coherence and consistency models in multiprocessor architecture

Directory protocol: basic idea● Global view of cache states● Centralized in directory● Unicast message● More scalability

When a directory receives a message, what does it happen?

Reply or Forward

Page 21: Coherence and consistency models in multiprocessor architecture

Possible cases:

Directory protocol: basic idea

● One request-reply

● One request -> K forwards -> K replies

● Point-to-point ordering

Page 22: Coherence and consistency models in multiprocessor architecture

Directory protocol: example1. Requestor sends GetM to

Directory2. Directory sends Ack Count

to Requestor3. Directory sends K Invalidate

Message to sharers4. Sharers send an AckInv to

requestor5. The requestor modifies the

block

Page 23: Coherence and consistency models in multiprocessor architecture

Directory state

● Coarse directories

● Limited pointer directory

Page 24: Coherence and consistency models in multiprocessor architecture

Directory distributed

Page 25: Coherence and consistency models in multiprocessor architecture

Snooping vs. Directory coherenceSnooping Solution (Snoopy Bus):● Send all requests for data to all processors (broadcast)● Scaling limited by cache miss & write traffic saturating the

bus

Directory-Based Schemes:● Send point-to-point requests to processors (unicast)● Keep track of what is being shared in a directory● Distributed memory => distributed directory (reducing

bottlenecks)

Page 26: Coherence and consistency models in multiprocessor architecture

Hybrid Designs

There are protocols that combine aspects of:● Snooping and directory protocols● Invalidate and update protocols

Achieving advantages from both the solutions.

Page 27: Coherence and consistency models in multiprocessor architecture

(aka memory consistency model, or, memory model)

● A specification of the allowed behavior of multithreaded programs executing with shared memory

● Multiple correct behaviors are usually allowed

One fundamental:● Out-of-Order execution

Consistency model: definition

Page 28: Coherence and consistency models in multiprocessor architecture

Cache (in)consistency

Should r2 always be set to NEW?NO!

Page 29: Coherence and consistency models in multiprocessor architecture

Core Might Reorder Memory AccessesSequential execution model (von Neumann):● Usually, operations to the same address execute in the

original program order.

Possible reorderings (to different addresses):● Store-Store: no FIFO write buffer● Load-Load● Load-Store and store-load: local bypass

Multiple executions allowed → Non-Determinism

S2 S7S1

write buffer

read R1

Page 30: Coherence and consistency models in multiprocessor architecture

● “The result of an execution is the same as if the operations had been executed in the order specified by the program.” (Lamport, 1979)

● Memory order must respect program order

● Every load gets its value from the last store before it (in global memory order)

Sequential consistency: basic idea

Page 31: Coherence and consistency models in multiprocessor architecture

Sequential consistency: Atomicity

● Need of instructions that atomically perform a “read–modify–write” (e.g. “test-and-set”)

● Simplistic approach: the core effectively locks the memory system → sacrifices performance

● Aggressive approach: only need for a “test-and-set” appearing in total order

Page 32: Coherence and consistency models in multiprocessor architecture

Sequential consistency: simple implementation

Page 33: Coherence and consistency models in multiprocessor architecture

Sequential (in)consistency: solved

Inconsistencycannot occur

anymore

Page 34: Coherence and consistency models in multiprocessor architecture

ConclusionsWhich protocol is the best?

It depends from:

● Technology● Architecture● Purposes and applications