Top Banner
CS390C: Principles of Concurrency and Parallelism Principles of Concurrency and Parallelism Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12
100

Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

Aug 20, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism

Principles of Concurrency and Parallelism

Lecture 8: Locks

2/28/12

slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit

Tuesday, February 28, 12

Page 2: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 2

New Focus: Performance

● Models− More complicated (not the same as complex!)

− Still focus on principles (not soon obsolete)

● Protocols− Elegant (in their fashion)

− Important (why else would we pay attention)

− And realistic (your mileage may vary)

Tuesday, February 28, 12

Page 3: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 3

Kinds of Architectures

● SISD (Uniprocessor)− Single instruction stream− Single data stream

● SIMD (Vector)− Single instruction− Multiple data

● MIMD (Multiprocessors)− Multiple instruction− Multiple data.

Tuesday, February 28, 12

Page 4: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 4

MIMD Architectures

• Memory Contention• Communication Contention • Communication Latency

Shared Bus

memory

Distributed

Tuesday, February 28, 12

Page 5: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 5

Revisit Mutual Exclusion

● Think of performance, not just correctness and progress

● Begin to understand how performance depends on our software properly utilizing the multiprocessor machine’s hardware

● And get to know a collection of locking algorithms…

(1)

Tuesday, February 28, 12

Page 6: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 6

Lock Contention

● Keep trying− “spin” or “busy-wait”

− Good if delays are short

● Give up the processor− Good if delays are long

− Always good on uniprocessor

(1)

Tuesday, February 28, 12

Page 7: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 7

Basic Spin-Lock

CS

Resets lock upon exit

spin lock

critical section

...

Tuesday, February 28, 12

Page 8: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 8

Basic Spin-Lock

CS

Resets lock upon exit

spin lock

critical section

...

…lock introduces sequential bottleneck…and introduces contention

no parallelism

Tuesday, February 28, 12

Page 9: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 8

Basic Spin-Lock

CS

Resets lock upon exit

spin lock

critical section

...

…lock introduces sequential bottleneck…and introduces contention

no parallelism

Tuesday, February 28, 12

Page 10: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 9

Test-and-Set

● Boolean value● Test-and-set (TAS)− Swap true with current value

− Return value tells if prior value was true or false

● Can reset just by writing false

● TAS aka “getAndSet”

Tuesday, February 28, 12

Page 11: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 10

Test-and-Set

public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) {

boolean prior = value; value = newValue; return prior; }}

(5)

Tuesday, February 28, 12

Page 12: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 11

Test-and-Set

AtomicBoolean lock = new AtomicBoolean(false)…boolean prior = lock.getAndSet(true)

(5)

Swapping in true is called “test-and-set” or TAS

Tuesday, February 28, 12

Page 13: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 12

Test-and-Set Locks

● Locking− Lock is free: value is false

− Lock is taken: value is true

● Acquire lock by calling TAS− If result is false, you win

− If result is true, you lose

● Release lock by writing false

Tuesday, February 28, 12

Page 14: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 13

Test-and-set Lock

class TASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}

Tuesday, February 28, 12

Page 15: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 14

Space Complexity

● TAS spin-lock has small “footprint” ● N thread spin-lock uses O(1) space

● As opposed to O(n) Peterson/Bakery

● How did we overcome the Ω(n) lower bound?

● We used a RMW operation…

Tuesday, February 28, 12

Page 16: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 15

Performance

● Experiment− n threads

− Increment shared counter 1 million times

● How long should it take?● How long does it take?

Tuesday, February 28, 12

Page 17: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 16

Graph

idealtime

threads

Tuesday, February 28, 12

Page 18: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 16

Graph

idealtime

threads

no speedup because of sequential bottleneck

Tuesday, February 28, 12

Page 19: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 17

Mystery #1

time

threads

TAS lock

Ideal

(1)

Tuesday, February 28, 12

Page 20: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 17

Mystery #1

time

threads

TAS lock

Ideal

(1)

What is going on?

Tuesday, February 28, 12

Page 21: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 18

Test-and-Test-and-Set Locks

● Lurking stage− Wait until lock “looks” free− Spin while read returns true (lock taken)

● Pouncing state− As soon as lock “looks” available− Read returns false (lock free)− Call TAS to acquire lock− If TAS loses, back to lurking

Tuesday, February 28, 12

Page 22: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 19

Test-and-test-and-set Lock

class TTASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; }}

Tuesday, February 28, 12

Page 23: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 20

Mystery #2

TAS lock

TTAS lock

Idealtime

threads

Tuesday, February 28, 12

Page 24: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 21

Mystery

● Both− TAS and TTAS

− Do the same thing (in our model)

● Except that − TTAS performs much better than TAS

− Neither approaches ideal

Tuesday, February 28, 12

Page 25: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism

Compare and Swap

22

CS390C: Principles of Concurrency and Parallelism

Hardware Approaches● Compare and Swap

− Three operands:

● a memory location (V)

● an expected old value (A)

● new value (B)

− Processor automatically updates location to new value

if the value stored is the expected old value.

− Using this for synchronization:

● read a value A from location V

● perform some computation to derive new value B

● use CAS to change the value of V from A to B

9

Tuesday, February 28, 12

Page 26: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism

Compare and Swap

23

CS390C: Principles of Concurrency and Parallelism

Compare and Swap

10

public class SimulatedCAS {

private int value;

public synchronized int getValue() { return value; }

public synchronized int compareAndSwap(int expectedValue, int newValue) {

int oldValue = value;

if (value == expectedValue)

value = newValue;

return oldValue;

}

}

Lock-free counter:

public class CasCounter {

private SimulatedCAS value;

public int getValue() {

return value.getValue();

}

public int increment() {

int oldValue = value.getValue();

while (value.compareAndSwap(oldValue, oldValue + 1) != oldValue)

oldValue = value.getValue();

return oldValue + 1;

}

}

Tuesday, February 28, 12

Page 27: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism

Taxonomy

24

CS390C: Principles of Concurrency and Parallelism

Lock-free algorithms

● An algorithm is said to be wait-free if every

thread makes progress in the face of arbitrary

delay (or even failure) of other threads.

● An algorithm is said to be lock-free if some

thread always makes progress.

− permits starvation

● An algorithm is said to be obstruction-free if at

any point, a single thread executed in isolation

for a bounded number of steps will complete.

11

Tuesday, February 28, 12

Page 28: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 25

Opinion

● Our memory abstraction is broken● TAS & TTAS methods− Are provably the same (in our model)

− Except they aren’t (in field tests)

● Need a more detailed model …

Tuesday, February 28, 12

Page 29: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 26

Bus-Based Architectures

Bus

cache

memory

cachecache

Tuesday, February 28, 12

Page 30: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 27

Bus-Based Architectures

Bus

cache

memory

cachecache

Random access memory (10s of cycles)

Tuesday, February 28, 12

Page 31: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 28

Bus-Based Architectures

cache

memory

cachecache

Shared Bus•Broadcast medium•One broadcaster at a time•Processors and memory all “snoop”

Bus

Tuesday, February 28, 12

Page 32: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 29

Bus-Based Architectures

Bus

cache

memory

cachecache

Per-Processor Caches•Small•Fast: 1 or 2 cycles•Address & state information

Tuesday, February 28, 12

Page 33: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 30

Jargon Watch

● Cache hit− “I found what I wanted in my cache”

− Good Thing™

Tuesday, February 28, 12

Page 34: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 31

Bus

Processor Issues Load Request

cache

memory

cachecache

data

Tuesday, February 28, 12

Page 35: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 32

Bus

Processor Issues Load Request

Bus

cache

memory

cachecache

data

Gimmedata

Tuesday, February 28, 12

Page 36: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 33

cache

Bus

Memory Responds

Bus

memory

cachecache

data

Got your data right

here data

Tuesday, February 28, 12

Page 37: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 33

cache

Bus

Memory Responds

Bus

memory

cachecache

data

Got your data right

here

data

Tuesday, February 28, 12

Page 38: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 34

Bus

Processor Issues Load Request

memory

cachecachedata

data

Gimmedata

Tuesday, February 28, 12

Page 39: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 35

Bus

Processor Issues Load Request

Bus

memory

cachecachedata

data

Gimmedata

Tuesday, February 28, 12

Page 40: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 36

Bus

Processor Issues Load Request

Bus

memory

cachecachedata

data

I got data

Tuesday, February 28, 12

Page 41: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 37

Bus

Other Processor Responds

memory

cachecache

data

I got data

datadataBus

Tuesday, February 28, 12

Page 42: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 38

Bus

Other Processor Responds

memory

cachecache

data

datadataBus

Tuesday, February 28, 12

Page 43: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 38

Bus

Other Processor Responds

memory

cachecache

data

datadataBus

Tuesday, February 28, 12

Page 44: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 39

Modify Cached Data

Bus

data

memory

cachedata

data

(1)

Tuesday, February 28, 12

Page 45: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 40

Modify Cached Data

Bus

data

memory

cachedata

data

data

(1)

Tuesday, February 28, 12

Page 46: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 41

memory

Bus

data

Modify Cached Data

cachedata

data

Tuesday, February 28, 12

Page 47: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 42

memory

Bus

data

Modify Cached Data

cache

What’s up with the other copies?

data

data

Tuesday, February 28, 12

Page 48: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 43

Cache Coherence

● We have lots of copies of data− Original copy in memory

− Cached copies at processors

● Some processor modifies its own copy− What do we do with the others?

− How to avoid confusion?

Tuesday, February 28, 12

Page 49: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 44

Write-Back Caches

● Accumulate changes in cache● Write back when needed− Need the cache for something else

− Another processor wants it

● On first modification− Invalidate other entries

− Requires non-trivial protocol …

Tuesday, February 28, 12

Page 50: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 45

Write-Back Caches

● Cache entry has three states− Invalid: contains raw seething bits

− Valid: I can read but I can’t write

− Dirty: Data has been modified● Intercept other load requests

● Write back to memory before using cache

Tuesday, February 28, 12

Page 51: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 46

Bus

Invalidate

memory

cachedatadata

data

Tuesday, February 28, 12

Page 52: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 47

Bus

Invalidate

Bus

memory

cachedatadata

data

Mine, all mine!

Tuesday, February 28, 12

Page 53: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 48

Bus

Invalidate

Bus

memory

cachedatadata

data

cache

Uh,oh

Tuesday, February 28, 12

Page 54: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 49

cacheBus

Invalidate

memory

cachedata

data

Other caches lose read permission

Tuesday, February 28, 12

Page 55: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 50

cacheBus

Invalidate

memory

cachedata

data

Other caches lose read permission

This cache acquires write permission

Tuesday, February 28, 12

Page 56: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 51

cacheBus

Invalidate

memory

cachedata

data

Memory provides data only if not present in any cache, so no need to change it now

(expensive)

(2)

Tuesday, February 28, 12

Page 57: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 52

cacheBus

Another Processor Asks for Data

memory

cachedata

data

(2)

Bus

Tuesday, February 28, 12

Page 58: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 53

cache dataBus

Owner Responds

memory

cachedata

data

(2)

Bus

Here it is!

Tuesday, February 28, 12

Page 59: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 53

cachedataBus

Owner Responds

memory

cachedata

data

(2)

Bus

Here it is!

Tuesday, February 28, 12

Page 60: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 54

Bus

End of the Day …

memory

cachedata

data

(1)

Reading OK, no writing

data data

Tuesday, February 28, 12

Page 61: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 55

Mutual Exclusion

● What do we want to optimize?− Bus bandwidth used by spinning threads

− Release/Acquire latency

− Acquire latency for idle lock

Tuesday, February 28, 12

Page 62: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 56

Simple TASLock

● TAS invalidates cache lines● Spinners− Miss in cache

− Go to bus

● Thread wants to release lock− delayed behind spinners

Tuesday, February 28, 12

Page 63: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 57

Test-and-test-and-set

● Wait until lock “looks” free− Spin on local cache

− No bus use while lock busy

● Problem: when lock is released− Invalidation storm …

Tuesday, February 28, 12

Page 64: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 58

Local Spinning while Lock is Busy

Bus

memory

busybusybusy

busy

Tuesday, February 28, 12

Page 65: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 59

Bus

On Release

memory

freeinvalidinvalid

free

Tuesday, February 28, 12

Page 66: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 60

On Release

Bus

memory

freeinvalidinvalid

free

miss miss

Everyone misses, rereads

(1)

Tuesday, February 28, 12

Page 67: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 61

On Release

Bus

memory

freeinvalidinvalid

free

TAS(…) TAS(…)

Everyone tries TAS

(1)

Tuesday, February 28, 12

Page 68: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 62

Problems

● Everyone misses− Reads satisfied sequentially

● Everyone does TAS− Invalidates others’ caches

● Eventually quiesces after lock acquired− How long does this take?

Tuesday, February 28, 12

Page 69: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 63

Measuring Quiescence Time

P1

P2

Pn

X = time of ops that don’t use the busY = time of ops that cause intensive bus traffic

In critical section, run ops X then ops Y. As long as Quiescence time is less than X, no drop in performance.

By gradually varying X, can determine the exact time to quiesce.

Tuesday, February 28, 12

Page 70: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 64

Quiescence Time

Increses linearly with the number of processors for bus architecturetim

e

threads

Tuesday, February 28, 12

Page 71: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 65

Mystery Explained

TAS lock

TTAS lock

Idealtime

threads

Tuesday, February 28, 12

Page 72: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 65

Mystery Explained

TAS lock

TTAS lock

Idealtime

threads Better than TAS but still not as good as ideal

Tuesday, February 28, 12

Page 73: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 66

Solution: Introduce Delay

spin locktimedr1dr2d

• If the lock looks free• But I fail to get it

• There must be contention• Better to back off than to collide again

Tuesday, February 28, 12

Page 74: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 67

Dynamic Example: Exponential Backoff

timed2d4d spin lock

If I fail to get lock− wait random duration before retry− Each subsequent failure doubles

expected wait

Tuesday, February 28, 12

Page 75: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 68

Exponential Backoff Lock

public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}

Tuesday, February 28, 12

Page 76: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 69

Spin-Waiting Overhead

TTAS Lock

Backoff locktime

threads

Tuesday, February 28, 12

Page 77: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 70

Backoff: Other Issues

● Good− Easy to implement

− Beats TTAS lock

● Bad− Must choose parameters carefully

− Not portable across platforms

Tuesday, February 28, 12

Page 78: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 71

Idea

● Avoid useless invalidations− By keeping a queue of threads

● Each thread− Notifies next in line− Without bothering the others

Tuesday, February 28, 12

Page 79: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 72

Anderson Queue Lock

flags

next

T F F F F F F F

idle

Tuesday, February 28, 12

Page 80: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 73

Anderson Queue Lock

flags

next

T F F F F F F F

acquiring

getAndIncrement

Tuesday, February 28, 12

Page 81: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 74

Anderson Queue Lock

flags

next

T F F F F F F F

acquiring

getAndIncrement

Tuesday, February 28, 12

Page 82: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 75

Anderson Queue Lock

flags

next

T F F F F F F F

acquired

Mine!

Tuesday, February 28, 12

Page 83: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 76

Anderson Queue Lock

flags

next

T F F F F F F F

acquired acquiring

Tuesday, February 28, 12

Page 84: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 77

Anderson Queue Lock

flags

next

T F F F F F F F

acquired acquiring

getAndIncrement

Tuesday, February 28, 12

Page 85: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 78

Anderson Queue Lock

flags

next

T F F F F F F F

acquired acquiring

getAndIncrement

Tuesday, February 28, 12

Page 86: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 79

acquired

Anderson Queue Lock

flags

next

T F F F F F F F

acquiring

Tuesday, February 28, 12

Page 87: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 80

released

Anderson Queue Lock

flags

next

T T F F F F F F

acquired

Tuesday, February 28, 12

Page 88: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 81

released

Anderson Queue Lock

flags

next

T T F F F F F F

acquired

Yow!

Tuesday, February 28, 12

Page 89: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 82

Anderson Queue Lock

class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal<Integer> mySlot;

Tuesday, February 28, 12

Page 90: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 83

Anderson Queue Lock

public lock() { mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false;}

public unlock() { flags[(mySlot+1) % n] = true;}

Tuesday, February 28, 12

Page 91: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 84

released

Local Spinning

flags

next

T F F F F F F F

acquiredSpin on my bit

Tuesday, February 28, 12

Page 92: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 84

released

Local Spinning

flags

next

T F F F F F F F

acquiredSpin on my bit

Unfortunately many bits share cache line

Tuesday, February 28, 12

Page 93: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 85

released

False Sharing

flags

next

T F F F F F F F

acquiredSpin on my bit

Line 1 Line 2

Tuesday, February 28, 12

Page 94: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 85

released

False Sharing

flags

next

T F F F F F F F

acquiredSpin on my bit

Line 1 Line 2

Spinning thread gets cache

invalidation on account of store by threads it is not waiting for

Tuesday, February 28, 12

Page 95: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 85

released

False Sharing

flags

next

T F F F F F F F

acquiredSpin on my bit

Line 1 Line 2

Spinning thread gets cache

invalidation on account of store by threads it is not waiting for

Result: contention

Tuesday, February 28, 12

Page 96: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 86

released

The Solution: Padding

flags

next

T / / / F / / /

acquired

Line 1 Line 2

Spin on my line

Tuesday, February 28, 12

Page 97: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 87

Performance

● Shorter handover than backoff

● Curve is practically flat● Scalable performance

queue

TTAS

Tuesday, February 28, 12

Page 98: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 88

Anderson Queue Lock

Good−First truly scalable lock−Simple, easy to implement−Back to FIFO order (like Bakery)

Tuesday, February 28, 12

Page 99: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 89

Anderson Queue Lock

Bad−Space hog…−One bit per thread one cache line

per thread●What if unknown number of threads?●What if small number of actual contenders?

Tuesday, February 28, 12

Page 100: Lecture 8: Locks 2/28/12 slides adapted from The …Lecture 8: Locks 2/28/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Tuesday, February 28, 12

CS390C: Principles of Concurrency and Parallelism 90

This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

• You are free:– to Share — to copy, distribute and transmit the work – to Remix — to adapt the work

• Under the following conditions:– Attribution. You must attribute the work to “The Art of

Multiprocessor Programming” (but not in any way that suggests that the authors endorse you or your use of the work).

– Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license.

• For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to– http://creativecommons.org/licenses/by-sa/3.0/.

• Any of the above conditions can be waived if you get permission from the copyright holder.

• Nothing in this license impairs or restricts the author's moral rights.

Tuesday, February 28, 12