Top Banner
Spin Locks and Contention Companion slides for Chapter 7 The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
87

Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Aug 26, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Spin Locks and Contention

Companion slides for Chapter 7 The Art of Multiprocessor

Programming by Maurice Herlihy & Nir Shavit

Page 2: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 2

Focus so far: Correctness and Progress

•  Models –  Accurate (we never lied to you)

–  But idealized (so we forgot to mention a few things)

•  Protocols –  Elegant –  Important –  But naïve

Page 3: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 3

New Focus: Performance

•  Models – More complicated (not the same as complex!)

–  Still focus on principles (not soon obsolete)

•  Protocols –  Elegant (in their fashion) –  Important (why else would we pay attention) –  And realistic (your mileage may vary)

Page 4: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 4

Kinds of Architectures •  SISD (Uniprocessor)

–  Single instruction stream –  Single data stream

•  SIMD (Vector) –  Single instruction –  Multiple data

•  MIMD (Multiprocessors) –  Multiple instruction –  Multiple data.

Page 5: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 5

Kinds of Architectures •  SISD (Uniprocessor)

–  Single instruction stream –  Single data stream

•  SIMD (Vector) –  Single instruction –  Multiple data

•  MIMD (Multiprocessors) –  Multiple instruction –  Multiple data.

Our space

(1)

Page 6: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 6

MIMD Architectures

•  Memory Contention •  Communication Contention •  Communication Latency

Shared Bus

memory

Distributed

Page 7: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 7

Today: Revisit Mutual Exclusion

•  Think of performance, not just correctness and progress

•  Begin to understand how performance depends on our software properly utilizing the multiprocessor machine’s hardware

•  And get to know a collection of locking algorithms…

(1)

Page 8: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 8

What Should you do if you can’t get a lock?

•  Keep trying –  “spin” or “busy-wait” –  Good if delays are short

•  Give up the processor –  Good if delays are long –  Always good on uniprocessor

(1)

Page 9: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 9

What Should you do if you can’t get a lock?

•  Keep trying –  “spin” or “busy-wait” –  Good if delays are short

•  Give up the processor –  Good if delays are long –  Always good on uniprocessor

our focus

Page 10: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 10

Basic Spin-Lock

CS

Resets lock upon exit

spin lock

critical section

. . .

Page 11: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 11

Basic Spin-Lock

CS

Resets lock upon exit

spin lock

critical section

. . .

…lock introduces sequential bottleneck

Page 12: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 12

Basic Spin-Lock

CS

Resets lock upon exit

spin lock

critical section

. . .

…lock suffers from contention

Page 13: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 13

Basic Spin-Lock

CS

Resets lock upon exit

spin lock

critical section

. . . Notice: these are distinct phenomena

…lock suffers from contention

Page 14: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 14

Basic Spin-Lock

CS

Resets lock upon exit

spin lock

critical section

. . .

…lock suffers from contention

Seq Bottleneck à no parallelism

Page 15: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 15

Basic Spin-Lock

CS

Resets lock upon exit

spin lock

critical section

. . . Contention à ???

…lock suffers from contention

Page 16: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 16

Review: Test-and-Set

•  Boolean value •  Test-and-set (TAS)

–  Swap true with current value –  Return value tells if prior value was true

or false •  Can reset just by writing false •  TAS aka “getAndSet”

Page 17: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 17

Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) {

boolean prior = value; value = newValue; return prior; } }

(5)

Page 18: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 18

Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) {

boolean prior = value; value = newValue; return prior; } }

Package java.util.concurrent.atomic

Page 19: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 19

Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) {

boolean prior = value; value = newValue; return prior; } }

Swap old and new values

Page 20: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 20

Review: Test-and-Set AtomicBoolean lock = new AtomicBoolean(false) … boolean prior = lock.getAndSet(true)

Page 21: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 21

Review: Test-and-Set AtomicBoolean lock = new AtomicBoolean(false) … boolean prior = lock.getAndSet(true)

(5)

Swapping in true is called “test-and-set” or TAS

Page 22: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 22

Test-and-Set Locks

•  Locking –  Lock is free: value is false –  Lock is taken: value is true

•  Acquire lock by calling TAS –  If result is false, you win –  If result is true, you lose

•  Release lock by writing false

Page 23: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 23

Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}

Page 24: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 24

Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}

Lock state is AtomicBoolean

Page 25: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 25

Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}

Keep trying until lock acquired

Page 26: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 26

Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}

Release lock by resetting state to false

Page 27: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 27

Space Complexity

•  TAS spin-lock has small “footprint” •  N thread spin-lock uses O(1) space •  As opposed to O(n) Peterson/Bakery •  How did we overcome the Ω(n) lower

bound? •  We used a RMW operation…

Page 28: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 28

Performance

•  Experiment –  n threads –  Increment shared counter 1 million times

•  How long should it take? •  How long does it take?

Page 29: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 29

Graph

ideal tim

e

threads

no speedup because of sequential bottleneck

Page 30: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 30

Mystery #1

tim

e

threads

TAS lock Ideal

(1)

What is going on?

Page 31: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 31

Test-and-Test-and-Set Locks

•  Lurking stage – Wait until lock “looks” free –  Spin while read returns true (lock taken)

•  Pouncing state –  As soon as lock “looks” available –  Read returns false (lock free) –  Call TAS to acquire lock –  If TAS loses, back to lurking

Page 32: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 32

Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; } }

Page 33: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 33

Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; } } Wait until lock looks free

Page 34: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 34

Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; } }

Then try to acquire it

Page 35: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 35

Mystery #2 TAS lock TTAS lock Ideal

tim

e

threads

Page 36: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 36

Mystery

•  Both –  TAS and TTAS –  Do the same thing (in our model)

•  Except that –  TTAS performs much better than TAS – Neither approaches ideal

Page 37: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 37

Opinion

•  Our memory abstraction is broken •  TAS & TTAS methods

–  Are provably the same (in our model)

–  Except they aren’t (in field tests)

•  Need a more detailed model …

Page 38: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 38

Bus-Based Architectures

Bus

cache

memory

cache cache

Page 39: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 39

Bus-Based Architectures

Bus

cache

memory

cache cache

Random access memory (10s of cycles)

Page 40: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 40

Bus-Based Architectures

cache

memory

cache cache

Shared Bus • Broadcast medium • One broadcaster at a time • Processors and memory all “snoop”

Bus

Page 41: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 41

Bus-Based Architectures

Bus

cache

memory

cache cache

Per-Processor Caches • Small • Fast: 1 or 2 cycles • Address & state information

Page 42: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 42

Jargon Watch

•  Cache hit –  “I found what I wanted in my cache” –  Good Thing™

Page 43: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 43

Jargon Watch

•  Cache hit –  “I found what I wanted in my cache” –  Good Thing™

•  Cache miss –  “I had to shlep all the way to memory

for that data” –  Bad Thing™

Page 44: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 44

Cave Canem

•  This model is still a simplification –  But not in any essential way –  Illustrates basic principles

•  Will discuss complexities later

Page 45: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 45

Bus

Processor Issues Load Request

cache

memory

cache cache

data

Page 46: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 46

Bus

Processor Issues Load Request

Bus

cache

memory

cache cache

data

Gimme data

Page 47: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 47

cache

Bus

Memory Responds

Bus

memory

cache cache

data

Got your data right

here data

Page 48: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 48

Bus

Processor Issues Load Request

memory

cache cache data

data

Gimme data

Page 49: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 49

Bus

Processor Issues Load Request

Bus

memory

cache cache data

data

Gimme data

Page 50: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 50

Bus

Processor Issues Load Request

Bus

memory

cache cache data

data

I got data

Page 51: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 51

Bus

Other Processor Responds

memory

cache cache

data

I got data

data data Bus

Page 52: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 52

Bus

Other Processor Responds

memory

cache cache

data

data data Bus

Page 53: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 53

Modify Cached Data

Bus

data

memory

cache data

data

(1)

Page 54: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 54

Modify Cached Data

Bus

data

memory

cache data

data

data

(1)

Page 55: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 55

memory

Bus

data

Modify Cached Data

cache data

data

Page 56: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 56

memory

Bus

data

Modify Cached Data

cache

What’s up with the other copies?

data

data

Page 57: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 57

Cache Coherence

•  We have lots of copies of data – Original copy in memory –  Cached copies at processors

•  Some processor modifies its own copy – What do we do with the others? – How to avoid confusion?

Page 58: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 58

Write-Back Caches

•  Accumulate changes in cache •  Write back when needed

– Need the cache for something else –  Another processor wants it

•  On first modification –  Invalidate other entries –  Requires non-trivial protocol …

Page 59: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 59

Write-Back Caches

•  Cache entry has three states –  Invalid: contains raw seething bits –  Valid: I can read but I can’t write –  Dirty: Data has been modified

•  Intercept other load requests • Write back to memory before using cache

Page 60: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 60

Bus

Invalidate

memory

cache data data

data

Page 61: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 61

Bus

Invalidate

Bus

memory

cache data data

data

Mine, all mine!

Page 62: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 62

Bus

Invalidate

Bus

memory

cache data data

data

cache

Uh,oh

Page 63: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 63

cache Bus

Invalidate

memory

cache data

data

Other caches lose read permission

Page 64: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 64

cache Bus

Invalidate

memory

cache data

data

Other caches lose read permission

This cache acquires write permission

Page 65: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 65

cache Bus

Invalidate

memory

cache data

data

Memory provides data only if not present in any cache, so no need to

change it now (expensive)

(2)

Page 66: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 66

cache Bus

Another Processor Asks for Data

memory

cache data

data

(2)

Bus

Page 67: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 67

cache data Bus

Owner Responds

memory

cache data

data

(2)

Bus

Here it is!

Page 68: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 68

Bus

End of the Day …

memory

cache data

data

(1)

Reading OK, no writing

data data

Page 69: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 69

Mutual Exclusion

•  What do we want to optimize? –  Bus bandwidth used by spinning threads –  Release/Acquire latency –  Acquire latency for idle lock

Page 70: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 70

Simple TASLock

•  TAS invalidates cache lines •  Spinners

– Miss in cache –  Go to bus

•  Thread wants to release lock –  delayed behind spinners

Page 71: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 71

Test-and-test-and-set

•  Wait until lock “looks” free –  Spin on local cache – No bus use while lock busy

•  Problem: when lock is released –  Invalidation storm …

Page 72: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 72

Local Spinning while Lock is Busy

Bus

memory

busy busy busy

busy

Page 73: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 73

Bus

On Release

memory

free invalid invalid

free

Page 74: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 74

On Release

Bus

memory

free invalid invalid

free

miss miss

Everyone misses, rereads

(1)

Page 75: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 75

On Release

Bus

memory

free invalid invalid

free

TAS(…) TAS(…)

Everyone tries TAS

(1)

Page 76: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 76

Problems

•  Everyone misses –  Reads satisfied sequentially

•  Everyone does TAS –  Invalidates others’ caches

•  Eventually quiesces after lock acquired – How long does this take?

Page 77: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 77

Mystery Explained TAS lock TTAS lock Ideal

tim

e

threads Better than TAS but still not as good as

ideal

Page 78: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 78

Solution: Introduce Delay

spin lock time d r1d r2d

•  If the lock looks free •  But I fail to get it

•  There must be lots of contention •  Better to back off than to collide again

Page 79: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 79

Dynamic Example: Exponential Backoff

time d 2d 4d spin lock

If I fail to get lock –  wait random duration before retry –  Each subsequent failure doubles expected wait

Page 80: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 80

Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}

Page 81: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 81

Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Fix minimum delay

Page 82: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 82

Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Wait until lock looks free

Page 83: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 83

Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} If we win, return

Page 84: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 84

Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}

Back off for random duration

Page 85: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 85

Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}

Double max delay, within reason

Page 86: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 86

Spin-Waiting Overhead

TTAS Lock

Backoff lock tim

e

threads

Page 87: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”

Art of Multiprocessor Programming 87

Backoff: Other Issues

•  Good –  Easy to implement –  Beats TTAS lock

•  Bad – Must choose parameters carefully – Not portable across platforms