Top Banner
Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
227

Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Mar 27, 2015

Download

Documents

Trinity Hunter
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Spin Locks and Contention

Companion slides forThe Art of Multiprocessor Programming

by Maurice Herlihy & Nir Shavit

Page 2: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 2

Focus so far: Correctness and Progress

• Models– Accurate (we never lied to you)

– But idealized (so we forgot to mention a few things)

• Protocols– Elegant– Important– But naïve

Page 3: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 3

New Focus: Performance

• Models– More complicated (not the same as complex!)

– Still focus on principles (not soon obsolete)

• Protocols– Elegant (in their fashion)

– Important (why else would we pay attention)

– And realistic (your mileage may vary)

Page 4: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 4

Kinds of Architectures

• SISD (Uniprocessor)– Single instruction stream– Single data stream

• SIMD (Vector)– Single instruction– Multiple data

• MIMD (Multiprocessors)– Multiple instruction– Multiple data.

Page 5: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 5

Kinds of Architectures

• SISD (Uniprocessor)– Single instruction stream– Single data stream

• SIMD (Vector)– Single instruction– Multiple data

• MIMD (Multiprocessors)– Multiple instruction– Multiple data.

Our space

(1)

Page 6: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 6

MIMD Architectures

• Memory Contention• Communication Contention • Communication Latency

Shared Bus

memory

Distributed

Page 7: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 7

Today: Revisit Mutual Exclusion

• Performance, not just correctness• Proper use of multiprocessor

architectures• A collection of locking algorithms…

(1)

Page 8: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 8

What Should you do if you can’t get a lock?

• Keep trying– “spin” or “busy-wait”– Good if delays are short

• Give up the processor– Good if delays are long– Always good on uniprocessor

(1)

Page 9: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 9

What Should you do if you can’t get a lock?

• Keep trying– “spin” or “busy-wait”– Good if delays are short

• Give up the processor– Good if delays are long– Always good on uniprocessor

our focus

Page 10: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 10

Basic Spin-Lock

CS

Resets lock upon exit

spin lock

critical section

...

Page 11: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 11

Basic Spin-Lock

CS

Resets lock upon exit

spin lock

critical section

...

…lock introduces sequential bottleneck

Page 12: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 12

Basic Spin-Lock

CS

Resets lock upon exit

spin lock

critical section

...

…lock suffers from contention

Page 13: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 13

Basic Spin-Lock

CS

Resets lock upon exit

spin lock

critical section

...Notice: these are distinct phenomena

…lock suffers from contention

Page 14: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 14

Basic Spin-Lock

CS

Resets lock upon exit

spin lock

critical section

...Seq Bottleneck no parallelism

…lock suffers from contention

Page 15: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 15

Basic Spin-Lock

CS

Resets lock upon exit

spin lock

critical section

...Contention ???

…lock suffers from contention

Page 16: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 16

Review: Test-and-Set

• Boolean value

• Test-and-set (TAS)– Swap true with current value– Return value tells if prior value was true or

false

• Can reset just by writing false

• TAS aka “getAndSet”

Page 17: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 17

Review: Test-and-Set

public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) {

boolean prior = value; value = newValue; return prior; }}

(5)

Page 18: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 18

Review: Test-and-Set

public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) {

boolean prior = value; value = newValue; return prior; }}

Packagejava.util.concurrent.atomic

Page 19: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 19

Review: Test-and-Set

public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) {

boolean prior = value; value = newValue; return prior; }}

Swap old and new values

Page 20: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 20

Review: Test-and-Set

AtomicBoolean lock = new AtomicBoolean(false)…boolean prior = lock.getAndSet(true)

Page 21: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 21

Review: Test-and-Set

AtomicBoolean lock = new AtomicBoolean(false)…boolean prior = lock.getAndSet(true)

(5)

Swapping in true is called “test-and-set” or TAS

Page 22: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 22

Test-and-Set Locks

• Locking– Lock is free: value is false– Lock is taken: value is true

• Acquire lock by calling TAS– If result is false, you win– If result is true, you lose

• Release lock by writing false

Page 23: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 23

Test-and-set Lock

class TASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}

Page 24: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 24

Test-and-set Lock

class TASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}

Lock state is AtomicBoolean

Page 25: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 25

Test-and-set Lock

class TASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}

Keep trying until lock acquired

Page 26: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 26

Test-and-set Lock

class TASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}

Release lock by resetting state to false

Page 27: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 27

Space Complexity

• TAS spin-lock has small “footprint”

• N thread spin-lock uses O(1) space

• As opposed to O(n) Peterson/Bakery

• How did we overcome the (n) lower bound?

• We used a RMW operation…

Page 28: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 28

Performance

• Experiment– n threads– Increment shared counter 1 million times

• How long should it take?

• How long does it take?

Page 29: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 29

Graph

ideal

time

threads

no speedup because of sequential bottleneck

Page 30: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 30

Mystery #1tim

e

threads

TAS lock

Ideal

What is going on?

Page 31: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 31

Test-and-Test-and-Set Locks

• Lurking stage– Wait until lock “looks” free– Spin while read returns true (lock taken)

• Pouncing state– As soon as lock “looks” available– Read returns false (lock free)– Call TAS to acquire lock– If TAS loses, back to lurking

Page 32: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 32

Test-and-test-and-set Lock

class TTASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; }}

Page 33: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 33

Test-and-test-and-set Lock

class TTASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; }} Wait until lock looks free

Page 34: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 34

Test-and-test-and-set Lock

class TTASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; }}

Then try to acquire it

Page 35: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 35

Mystery #2

TAS lock

TTAS lock

Ideal

time

threads

Page 36: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 36

Mystery

• Both– TAS and TTAS– Do the same thing (in our model)

• Except that– TTAS performs much better than TAS– Neither approaches ideal

Page 37: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 37

Opinion

• Our memory abstraction is broken

• TAS & TTAS methods– Are provably the same (in our model)

– Except they aren’t (in field tests)

• Need a more detailed model …

Page 38: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 38

Bus-Based Architectures

Bus

cache

memory

cachecache

Page 39: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 39

Bus-Based Architectures

Bus

cache

memory

cachecache

Random access memory (10s of cycles)

Page 40: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 40

Bus-Based Architectures

cache

memory

cachecache

Shared Bus•Broadcast medium•One broadcaster at a time•Processors and memory all “snoop”

Bus

Page 41: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 41

Bus-Based Architectures

Bus

cache

memory

cachecache

Per-Processor Caches•Small•Fast: 1 or 2 cycles•Address & state information

Page 42: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 42

Jargon Watch

• Cache hit– “I found what I wanted in my cache”– Good Thing™

Page 43: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 43

Jargon Watch

• Cache hit– “I found what I wanted in my cache”– Good Thing™

• Cache miss– “I had to shlep all the way to memory for

that data”– Bad Thing™

Page 44: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 44

Cave Canem

• This model is still a simplification– But not in any essential way– Illustrates basic principles

• Will discuss complexities later

Page 45: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 45

Bus

Processor Issues Load Request

cache

memory

cachecache

data

Page 46: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 46

Bus

Processor Issues Load Request

Bus

cache

memory

cachecache

data

Gimmedata

Page 47: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 47

cache

Bus

Memory Responds

Bus

memory

cachecache

data

Got your data right

here data

Page 48: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 48

Bus

Processor Issues Load Request

memory

cachecachedata

data

Gimmedata

Page 49: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 49

Bus

Processor Issues Load Request

Bus

memory

cachecachedata

data

Gimmedata

Page 50: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 50

Bus

Processor Issues Load Request

Bus

memory

cachecachedata

data

I got data

Page 51: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 51

Bus

Other Processor Responds

memory

cachecache

data

I got data

datadata

Bus

Page 52: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 52

Bus

Other Processor Responds

memory

cachecache

data

datadata

Bus

Page 53: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 53

Modify Cached Data

Bus

data

memory

cachedata

data

(1)

Page 54: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 54

Modify Cached Data

Bus

data

memory

cachedata

data

data

(1)

Page 55: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 55

memory

Bus

data

Modify Cached Data

cachedata

data

Page 56: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 56

memory

Bus

data

Modify Cached Data

cache

What’s up with the other copies?

data

data

Page 57: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 57

Cache Coherence

• We have lots of copies of data– Original copy in memory – Cached copies at processors

• Some processor modifies its own copy– What do we do with the others?– How to avoid confusion?

Page 58: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 58

Write-Back Caches

• Accumulate changes in cache

• Write back when needed– Need the cache for something else– Another processor wants it

• On first modification– Invalidate other entries– Requires non-trivial protocol …

Page 59: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 59

Write-Back Caches

• Cache entry has three states– Invalid: contains raw seething bits– Valid: I can read but I can’t write– Dirty: Data has been modified

• Intercept other load requests• Write back to memory before using cache

Page 60: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 60

Bus

Invalidate

memory

cachedatadata

data

Page 61: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 61

Bus

Invalidate

Bus

memory

cachedatadata

data

Mine, all mine!

Page 62: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 62

Bus

Invalidate

Bus

memory

cachedatadata

data

cache

Uh,oh

Page 63: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 63

cache

Bus

Invalidate

memory

cachedata

data

Other caches lose read permission

Page 64: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 64

cache

Bus

Invalidate

memory

cachedata

data

Other caches lose read permission

This cache acquires write permission

Page 65: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 65

cache

Bus

Invalidate

memory

cachedata

data

Memory provides data only if not present in any cache, so no need to

change it now (expensive)

Page 66: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 66

cache

Bus

Another Processor Asks for Data

memory

cachedata

data

Bus

Page 67: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 67

cache data

Bus

Owner Responds

memory

cachedata

data

Bus

Here it is!

Page 68: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 68

Bus

End of the Day …

memory

cachedata

dataReading OK, no writing

data data

Page 69: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 69

Mutual Exclusion

• What do we want to optimize?– Bus bandwidth used by spinning threads– Release/Acquire latency– Acquire latency for idle lock

Page 70: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 70

Simple TASLock

• TAS invalidates cache lines

• Spinners– Miss in cache– Go to bus

• Thread wants to release lock– delayed behind spinners

Page 71: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 71

Test-and-test-and-set

• Wait until lock “looks” free– Spin on local cache– No bus use while lock busy

• Problem: when lock is released– Invalidation storm …

Page 72: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 72

Local Spinning while Lock is Busy

Bus

memory

busybusybusy

busy

Page 73: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 73

Bus

On Release

memory

freeinvalidinvalid

free

Page 74: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 74

On Release

Bus

memory

freeinvalidinvalid

free

miss miss

Everyone misses, rereads

(1)

Page 75: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 75

On Release

Bus

memory

freeinvalidinvalid

free

TAS(…) TAS(…)

Everyone tries TAS

(1)

Page 76: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 76

Problems

• Everyone misses– Reads satisfied sequentially

• Everyone does TAS– Invalidates others’ caches

• Eventually quiesces after lock acquired– How long does this take?

Page 77: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Measuring Quiescence Time

• Acquire lock• Pause without using bus• Use bus heavily

Art of Multiprocessor Programming 77

P1

P2

Pn

If pause > quiescence time,critical section duration independent of number of threads

If pause < quiescence time,critical section duration slower with more threads

Page 78: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 78

Quiescence Time

Increses linearly with the number of processors for bus architecturetim

e

threads

Page 79: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 79

Mystery Explained

TAS lock

TTAS lock

Ideal

time

threads

Better than TAS but still not as good as ideal

Page 80: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 80

Solution: Introduce Delay

spin locktimedr1dr2d

• If the lock looks free• But I fail to get it

• There must be contention• Better to back off than to collide again

Page 81: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 81

Dynamic Example: Exponential Backoff

timed2d4d spin lock

If I fail to get lock– wait random duration before retry– Each subsequent failure doubles expected wait

Page 82: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 82

Exponential Backoff Lock

public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}

Page 83: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 83

Exponential Backoff Lock

public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Fix minimum delay

Page 84: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 84

Exponential Backoff Lock

public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Wait until lock looks free

Page 85: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 85

Exponential Backoff Lock

public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} If we win, return

Page 86: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 86

Exponential Backoff Lock

public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}

Back off for random duration

Page 87: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 87

Exponential Backoff Lock

public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}

Double max delay, within reason

Page 88: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 88

Spin-Waiting Overhead

TTAS Lock

Backoff lock

time

threads

Page 89: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 89

Backoff: Other Issues

• Good– Easy to implement– Beats TTAS lock

• Bad– Must choose parameters carefully– Not portable across platforms

Page 90: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 90

Idea

• Avoid useless invalidations– By keeping a queue of threads

• Each thread– Notifies next in line– Without bothering the others

Page 91: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 91

Anderson Queue Lock

flags

next

T F F F F F F F

idle

Page 92: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 92

Anderson Queue Lock

flags

next

T F F F F F F F

acquiring

getAndIncrement

Page 93: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 93

Anderson Queue Lock

flags

next

T F F F F F F F

acquiring

getAndIncrement

Page 94: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 94

Anderson Queue Lock

flags

next

T F F F F F F F

acquired

Mine!

Page 95: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 95

Anderson Queue Lock

flags

next

T F F F F F F F

acquired acquiring

Page 96: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 96

Anderson Queue Lock

flags

next

T F F F F F F F

acquired acquiring

getAndIncrement

Page 97: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 97

Anderson Queue Lock

flags

next

T F F F F F F F

acquired acquiring

getAndIncrement

Page 98: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 98

acquired

Anderson Queue Lock

flags

next

T F F F F F F F

acquiring

Page 99: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 99

released

Anderson Queue Lock

flags

next

T T F F F F F F

acquired

Page 100: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 100

released

Anderson Queue Lock

flags

next

T T F F F F F F

acquired

Yow!

Page 101: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 101

Anderson Queue Lock

class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal<Integer> mySlot;

Page 102: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 102

Anderson Queue Lock

class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal<Integer> mySlot;

One flag per thread

Page 103: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 103

Anderson Queue Lock

class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal<Integer> mySlot;

Next flag to use

Page 104: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 104

Anderson Queue Lock

class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal<Integer> mySlot;

Thread-local variable

Page 105: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 105

Anderson Queue Lock

public lock() {

mySlot = next.getAndIncrement();

while (!flags[mySlot % n]) {};

flags[mySlot % n] = false;}

public unlock() { flags[(mySlot+1) % n] = true;}

Page 106: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 106

Anderson Queue Lock

public lock() {

mySlot = next.getAndIncrement();

while (!flags[mySlot % n]) {};

flags[mySlot % n] = false;}

public unlock() { flags[(mySlot+1) % n] = true;} Take next slot

Page 107: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 107

Anderson Queue Lock

public lock() {

mySlot = next.getAndIncrement();

while (!flags[mySlot % n]) {};

flags[mySlot % n] = false;}

public unlock() { flags[(mySlot+1) % n] = true;} Spin until told to go

Page 108: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 108

Anderson Queue Lock

public lock() {

myslot = next.getAndIncrement();

while (!flags[myslot % n]) {};

flags[myslot % n] = false;}

public unlock() { flags[(myslot+1) % n] = true;} Prepare slot for re-use

Page 109: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 109

Anderson Queue Lock

public lock() {

mySlot = next.getAndIncrement();

while (!flags[mySlot % n]) {};

flags[mySlot % n] = false;}

public unlock() { flags[(mySlot+1) % n] = true;}

Tell next thread to go

Page 110: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 110

released

Local Spinning

flags

next

T F F F F F F F

acquiredSpin on my bit

Unfortunately many bits share cache line

Page 111: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

111

released

False Sharing

flags

next

T F F F F F F F

acquiredSpin on my bit

Line 1 Line 2

Spinning thread gets cache

invalidation on account of store by threads it is not waiting for

Result: contention

Art of Multiprocessor Programming

Page 112: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

112

released

The Solution: Padding

flags

next

T / / / F / / /

acquired

Line 1 Line 2Art of Multiprocessor Programming

Spin on my line

Page 113: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 113

Performance

• Shorter handover than backoff

• Curve is practically flat• Scalable performance

queue

TTAS

Page 114: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 114

Anderson Queue Lock

Good– First truly scalable lock

– Simple, easy to implement

– Back to FIFO order (like Bakery)

Page 115: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 115

Anderson Queue LockBad

– Space hog…

– One bit per thread one cache line per thread• What if unknown number of threads?• What if small number of actual

contenders?

Page 116: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 116

CLH Lock

• FIFO order

• Small, constant-size overhead per thread

Page 117: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 117

Initially

false

tail

idle

Page 118: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 118

Initially

false

tail

idle

Queue tail

Page 119: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 119

Initially

false

tail

idle

Lock is free

Page 120: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 120

Initially

false

tail

idle

Page 121: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 121

Purple Wants the Lock

false

tail

acquiring

Page 122: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 122

Purple Wants the Lock

false

tail

acquiring

true

Page 123: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 123

Purple Wants the Lock

false

tail

acquiring

true

Swap

Page 124: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 124

Purple Has the Lock

false

tail

acquired

true

Page 125: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 125

Red Wants the Lock

false

tail

acquired acquiring

true true

Page 126: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 126

Red Wants the Lock

false

tail

acquired acquiring

true

Swap

true

Page 127: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 127

Red Wants the Lock

false

tail

acquired acquiring

true true

Page 128: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 128

Red Wants the Lock

false

tail

acquired acquiring

true true

Page 129: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 129

Red Wants the Lock

false

tail

acquired acquiring

true true

ImplicitLinked list

Page 130: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 130

Red Wants the Lock

false

tail

acquired acquiring

true true

Page 131: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 131

Red Wants the Lock

false

tail

acquired acquiring

true true

trueActually, it spins on cached copy

Page 132: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 132

Purple Releases

false

tail

release acquiring

false true

false Bingo!

Page 133: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 133

Purple Releases

tail

released acquired

true

Page 134: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 134

Space Usage

• Let– L = number of locks– N = number of threads

• ALock– O(LN)

• CLH lock– O(L+N)

Page 135: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 135

CLH Queue Lock

class Qnode {

AtomicBoolean locked =

new AtomicBoolean(true);

}

Page 136: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 136

CLH Queue Lock

class Qnode {

AtomicBoolean locked =

new AtomicBoolean(true);

}

Not released yet

Page 137: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 137

CLH Queue Lockclass CLHLock implements Lock {

AtomicReference<Qnode> tail;

ThreadLocal<Qnode> myNode

= new Qnode();

public void lock() {

Qnode pred

= tail.getAndSet(myNode);

while (pred.locked) {}

}}

Page 138: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 138

CLH Queue Lockclass CLHLock implements Lock {

AtomicReference<Qnode> tail;

ThreadLocal<Qnode> myNode

= new Qnode();

public void lock() {

Qnode pred

= tail.getAndSet(myNode);

while (pred.locked) {}

}}Queue tail

Page 139: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 139

CLH Queue Lockclass CLHLock implements Lock {

AtomicReference<Qnode> tail;

ThreadLocal<Qnode> myNode

= new Qnode();

public void lock() {

Qnode pred

= tail.getAndSet(myNode);

while (pred.locked) {}

}}Thread-local Qnode

Page 140: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 140

CLH Queue Lockclass CLHLock implements Lock {

AtomicReference<Qnode> tail;

ThreadLocal<Qnode> myNode

= new Qnode();

public void lock() {

Qnode pred

= tail.getAndSet(myNode);

while (pred.locked) {}

}}

Swap in my node

Page 141: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 141

CLH Queue Lockclass CLHLock implements Lock {

AtomicReference<Qnode> tail;

ThreadLocal<Qnode> myNode

= new Qnode();

public void lock() {

Qnode pred

= tail.getAndSet(myNode);

while (pred.locked) {}

}}

Spin until predecessorreleases lock

Page 142: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 142

CLH Queue LockClass CLHLock implements Lock {

public void unlock() {

myNode.locked.set(false);

myNode = pred;

}

}

Page 143: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 143

CLH Queue LockClass CLHLock implements Lock {

public void unlock() {

myNode.locked.set(false);

myNode = pred;

}

}

Notify successor

Page 144: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 144

CLH Queue LockClass CLHLock implements Lock {

public void unlock() {

myNode.locked.set(false);

myNode = pred;

}

}

Recycle predecessor’s node

Page 145: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 145

CLH Queue LockClass CLHLock implements Lock {

public void unlock() {

myNode.locked.set(false);

myNode = pred;

}

}

(we don’t actually reuse myNode. Code in book shows how it’s done.)

Page 146: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 146

CLH Lock

• Good– Lock release affects predecessor only– Small, constant-sized space

• Bad– Doesn’t work for uncached NUMA

architectures

Page 147: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 147

NUMA Architecturs

• Acronym:– Non-Uniform Memory Architecture

• Illusion:– Flat shared memory

• Truth:– No caches (sometimes)– Some memory regions faster than others

Page 148: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 148

NUMA Machines

Spinning on local memory is fast

Page 149: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 149

NUMA Machines

Spinning on remote memory is slow

Page 150: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 150

CLH Lock

• Each thread spins on predecessor’s memory

• Could be far away …

Page 151: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 151

MCS Lock

• FIFO order

• Spin on local memory only

• Small, Constant-size overhead

Page 152: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 152

Initially

falsefalse

idle

tail

Page 153: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 153

Acquiring

falsefalse

true

acquiring

(allocate Qnode)

tail

Page 154: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 154

Acquiring

false

tail false

true

acquired

swap

Page 155: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 155

Acquiring

false

tail false

true

acquired

Page 156: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 156

Acquired

false

tail false

true

acquired

Page 157: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 157

Acquiring

tail

false

acquiredacquiring

trueswap

Page 158: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 158

Acquiring

tail

acquiredacquiring

true

false

Page 159: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 159

Acquiring

tail

acquiredacquiring

true

false

Page 160: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 160

Acquiring

tail

acquiredacquiring

true

false

Page 161: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 161

Acquiring

tail

acquiredacquiring

true

true

false

Page 162: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 162

Acquiring

tail

acquiredacquiring

true

trueYes!

false

Page 163: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 163

MCS Queue Lock

class Qnode {

boolean locked = false;

qnode next = null;

}

Page 164: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 164

MCS Queue Lockclass MCSLock implements Lock {

AtomicReference tail;

public void lock() {

Qnode qnode = new Qnode();

Qnode pred = tail.getAndSet(qnode);

if (pred != null) {

qnode.locked = true;

pred.next = qnode;

while (qnode.locked) {}

}}}

Page 165: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 165

MCS Queue Lockclass MCSLock implements Lock {

AtomicReference tail;

public void lock() {

Qnode qnode = new Qnode();

Qnode pred = tail.getAndSet(qnode);

if (pred != null) {

qnode.locked = true;

pred.next = qnode;

while (qnode.locked) {}

}}}

Make a QNode

Page 166: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 166

MCS Queue Lockclass MCSLock implements Lock {

AtomicReference tail;

public void lock() {

Qnode qnode = new Qnode();

Qnode pred = tail.getAndSet(qnode);

if (pred != null) {

qnode.locked = true;

pred.next = qnode;

while (qnode.locked) {}

}}}

add my Node to the tail of queue

Page 167: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 167

MCS Queue Lockclass MCSLock implements Lock {

AtomicReference tail;

public void lock() {

Qnode qnode = new Qnode();

Qnode pred = tail.getAndSet(qnode);

if (pred != null) {

qnode.locked = true;

pred.next = qnode;

while (qnode.locked) {}

}}}

Fix if queue was non-empty

Page 168: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 168

MCS Queue Lockclass MCSLock implements Lock {

AtomicReference tail;

public void lock() {

Qnode qnode = new Qnode();

Qnode pred = tail.getAndSet(qnode);

if (pred != null) {

qnode.locked = true;

pred.next = qnode;

while (qnode.locked) {}

}}}

Wait until unlocked

Page 169: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 169

MCS Queue Unlockclass MCSLock implements Lock {

AtomicReference tail;

public void unlock() {

if (qnode.next == null) {

if (tail.CAS(qnode, null)

return;

while (qnode.next == null) {}

}

qnode.next.locked = false;

}}

Page 170: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 170

MCS Queue Lockclass MCSLock implements Lock {

AtomicReference tail;

public void unlock() {

if (qnode.next == null) {

if (tail.CAS(qnode, null)

return;

while (qnode.next == null) {}

}

qnode.next.locked = false;

}}

Missingsuccessor

?

Page 171: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 171

MCS Queue Lockclass MCSLock implements Lock {

AtomicReference tail;

public void unlock() {

if (qnode.next == null) {

if (tail.CAS(qnode, null)

return;

while (qnode.next == null) {}

}

qnode.next.locked = false;

}}

If really no successor, return

Page 172: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 172

MCS Queue Lockclass MCSLock implements Lock {

AtomicReference tail;

public void unlock() {

if (qnode.next == null) {

if (tail.CAS(qnode, null)

return;

while (qnode.next == null) {}

}

qnode.next.locked = false;

}}

Otherwise wait for successor to catch up

Page 173: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 173

MCS Queue Lockclass MCSLock implements Lock {

AtomicReference queue;

public void unlock() {

if (qnode.next == null) {

if (tail.CAS(qnode, null)

return;

while (qnode.next == null) {}

}

qnode.next.locked = false;

}}

Pass lock to successor

Page 174: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 174

Purple Release

false

releasing swap

false

Page 175: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 175

Purple Release

false

releasing swap

false

By looking at the queue, I see another thread is active

Page 176: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 176

Purple Release

false

releasing swap

false

I have to wait for that thread to finish

By looking at the queue, I see another thread is active

Page 177: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 177

Purple Release

false

releasing prepare to spin

true

Page 178: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 178

Purple Release

false

releasing spinning

true

Page 179: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 179

Purple Release

false

releasing spinning

truefalse

Page 180: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 180

Purple Release

false

releasing

true

Acquired lock

false

Page 181: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 181

Abortable Locks

• What if you want to give up waiting for a lock?

• For example– Timeout– Database transaction aborted by user

Page 182: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 182

Back-off Lock

• Aborting is trivial– Just return from lock() call

• Extra benefit:– No cleaning up– Wait-free– Immediate return

Page 183: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 183

Queue Locks

• Can’t just quit– Thread in line behind will starve

• Need a graceful way out

Page 184: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 184

Queue Locks

spinning

true

spinning

truetrue

spinning

Page 185: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 185

Queue Locks

spinning

true

spinning

truefalse

locked

Page 186: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 186

Queue Locks

spinning

true

locked

false

Page 187: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 187

Queue Locks

locked

false

Page 188: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 188

Queue Locks

spinning

true

spinning

truetrue

spinning

Page 189: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 189

Queue Locks

spinning

truetruetrue

spinning

Page 190: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 190

Queue Locks

spinning

truetruefalse

locked

Page 191: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 191

Queue Locks

spinning

truefalse

Page 192: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 192

Queue Locks

pwned

truefalse

Page 193: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 193

Abortable CLH Lock

• When a thread gives up– Removing node in a wait-free way is hard

• Idea:– let successor deal with it.

Page 194: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 194

Initially

tail

idlePointer to

predecessor (or null)

A

Page 195: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 195

Initially

tail

idleDistinguished available node means lock is

free

A

Page 196: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 196

Acquiring

tail

acquiring

A

Page 197: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 197

Acquiringacquiring

A

Null predecessor means lock not

released or aborted

Page 198: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 198

Acquiringacquiring

A

Swap

Page 199: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 199

Acquiringacquiring

A

Page 200: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 200

Acquiredlocked

A

Reference to AVAILABLE means

lock is free.

Page 201: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

spinningspinninglocked

Art of Multiprocessor Programming 201

Normal Case

Null means lock is not free & request

not aborted

Page 202: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 202

One Thread Aborts

spinningTimed outlocked

Page 203: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 203

Successor Notices

spinningTimed outlocked

Non-Null means predecessor

aborted

Page 204: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 204

Recycle Predecessor’s Node

spinninglocked

Page 205: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 205

Spin on Earlier Node

spinninglocked

Page 206: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 206

Spin on Earlier Node

spinningreleased

A

The lock is now mine

Page 207: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 207

Time-out Lockpublic class TOLock implements Lock {

static Qnode AVAILABLE

= new Qnode();

AtomicReference<Qnode> tail;

ThreadLocal<Qnode> myNode;

Page 208: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 208

Time-out Lockpublic class TOLock implements Lock {

static Qnode AVAILABLE

= new Qnode();

AtomicReference<Qnode> tail;

ThreadLocal<Qnode> myNode;

AVAILABLE node signifies free lock

Page 209: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 209

Time-out Lockpublic class TOLock implements Lock {

static Qnode AVAILABLE

= new Qnode();

AtomicReference<Qnode> tail;

ThreadLocal<Qnode> myNode;

Tail of the queue

Page 210: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 210

Time-out Lockpublic class TOLock implements Lock {

static Qnode AVAILABLE

= new Qnode();

AtomicReference<Qnode> tail;

ThreadLocal<Qnode> myNode;

Remember my node …

Page 211: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 211

Time-out Lockpublic boolean lock(long timeout) {

Qnode qnode = new Qnode();

myNode.set(qnode);

qnode.prev = null;

Qnode myPred = tail.getAndSet(qnode);

if (myPred== null

|| myPred.prev == AVAILABLE) {

return true;

}

Page 212: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 212

Time-out Lockpublic boolean lock(long timeout) {

Qnode qnode = new Qnode();

myNode.set(qnode);

qnode.prev = null;

Qnode myPred = tail.getAndSet(qnode);

if (myPred == null

|| myPred.prev == AVAILABLE) {

return true;

}

Create & initialize node

Page 213: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 213

Time-out Lockpublic boolean lock(long timeout) {

Qnode qnode = new Qnode();

myNode.set(qnode);

qnode.prev = null;

Qnode myPred = tail.getAndSet(qnode);

if (myPred == null

|| myPred.prev == AVAILABLE) {

return true;

}

Swap with tail

Page 214: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 214

Time-out Lockpublic boolean lock(long timeout) {

Qnode qnode = new Qnode();

myNode.set(qnode);

qnode.prev = null;

Qnode myPred = tail.getAndSet(qnode);

if (myPred == null

|| myPred.prev == AVAILABLE) {

return true;

}

...If predecessor absent or

released, we are done

Page 215: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 215

Time-out Lock…

long start = now();

while (now()- start < timeout) {

Qnode predPred = myPred.prev;

if (predPred == AVAILABLE) {

return true;

} else if (predPred != null) {

myPred = predPred;

}

}

spinningspinninglocked

Page 216: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 216

Time-out Lock…

long start = now();

while (now()- start < timeout) {

Qnode predPred = myPred.prev;

if (predPred == AVAILABLE) {

return true;

} else if (predPred != null) {

myPred = predPred;

}

}

Keep trying for a while …

Page 217: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 217

Time-out Lock…

long start = now();

while (now()- start < timeout) {

Qnode predPred = myPred.prev;

if (predPred == AVAILABLE) {

return true;

} else if (predPred != null) {

myPred = predPred;

}

}

Spin on predecessor’s prev field

Page 218: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 218

Time-out Lock…

long start = now();

while (now()- start < timeout) {

Qnode predPred = myPred.prev;

if (predPred == AVAILABLE) {

return true;

} else if (predPred != null) {

myPred = predPred;

}

}

…Predecessor released lock

Page 219: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 219

Time-out Lock…

long start = now();

while (now()- start < timeout) {

Qnode predPred = myPred.prev;

if (predPred == AVAILABLE) {

return true;

} else if (predPred != null) {

myPred = predPred;

}

}

Predecessor aborted, advance one

Page 220: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 220

Time-out Lock…

if (!tail.compareAndSet(qnode, myPred))

qnode.prev = myPred;

return false;

}

}

What do I do when I time out?

Page 221: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 221

Time-out Lock…

if (!tail.compareAndSet(qnode, myPred))

qnode.prev = myPred;

return false;

}

}

Do I have a successor?If CAS fails, I do.

Tell it about myPred

Page 222: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 222

Time-out Lock…

if (!tail.compareAndSet(qnode, myPred))

qnode.prev = myPred;

return false;

}

}

If CAS succeeds: no successor, simply return false

Page 223: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 223

Time-Out Unlockpublic void unlock() {

Qnode qnode = myNode.get();

if (!tail.compareAndSet(qnode, null))

qnode.prev = AVAILABLE;

}

Page 224: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 224

public void unlock() {

Qnode qnode = myNode.get();

if (!tail.compareAndSet(qnode, null))

qnode.prev = AVAILABLE;

}

Time-out Unlock

If CAS failed:successor exists,notify it can enter

Page 225: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 225

public void unlock() {

Qnode qnode = myNode.get();

if (!tail.compareAndSet(qnode, null))

qnode.prev = AVAILABLE;

}

Timing-out Lock

CAS successful: set tail to null, no clean up since no

successor waiting

Page 226: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 226

One Lock To Rule Them All?

• TTAS+Backoff, CLH, MCS, ToLock…

• Each better than others in some way

• There is no one solution

• Lock we pick really depends on:– the application– the hardware– which properties are important

Page 227: Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Art of Multiprocessor Programming 227

         This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

• You are free:– to Share — to copy, distribute and transmit the work – to Remix — to adapt the work

• Under the following conditions:– Attribution. You must attribute the work to “The Art of

Multiprocessor Programming” (but not in any way that suggests that the authors endorse you or your use of the work).

– Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license.

• For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to– http://creativecommons.org/licenses/by-sa/3.0/.

• Any of the above conditions can be waived if you get permission from the copyright holder.

• Nothing in this license impairs or restricts the author's moral rights.