Top Banner
Introduction to Multiprocessor Synchronization Maurice Herlihy http://cs.brown.edu/courses/cs176/lectures.shtml
164

Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Jun 25, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Introduction to

Multiprocessor

Synchronization

Maurice Herlihy

http://cs.brown.edu/courses/cs176/lectures.shtml

Page 2: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 2

Moore's Law

Clock speed

flattening

sharply

Transistor

count still

rising

Page 3: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 3

Once roamed the Earth:

the Uniprocesor

memory

cpu

Page 4: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 4

Endangered:

The Shared Memory Multiprocessor

(SMP)

cache

BusBus

shared memory

cachecache

Page 5: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 5

Meet he New Boss:

The Multicore Processor

(CMP)

cache

BusBus

shared memory

cachecacheAll on the

same chip

Oracle

Niagara

Chip

Page 6: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

CDS.IISc.ac.in | Department of Computational and Data Sciences

Turing Cluster

24 Compute Nodes in two 12 node 3U blades. Each node has one 8-core AMD Opteron 3380 processor @ 2.6GHz, 32GB RAM, 2TB HDD, Gigabit Ethernet port

1 Head Node with one 6-core Intel Xeon E5-2620 v3 processor @ 2.40GHz, 48GB RAM, 1+4TB HDD, Gigabit Ethernet ports

One 24 port L2 Gigabit Ethernet switch

Running CentOS, MPI, PBS and Apache Hadoop/Yarn

Mounted on a 24U Rack http://cds.iisc.ac.in/internal-resources/computing-resources/

Art of Multiprocessor Programming

Page 7: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

CDS.IISc.ac.in | Department of Computational and Data Sciences

Turing Cluster: Xeon E5-2620 v3

http://2eof2j3oc7is20vt9q3g7tlo5xe.wpengine.netdna-cdn.com/wp-content/uploads/2014/09/intel-xeon-e5-v3-block-diagram-detailed.jpg

http://www.enterprisetech.com/wp-content/uploads/2014/09/intel-xeon-e5-v3-die-shot.jpg

Page 8: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 9

Traditional Scaling Process

User code

Traditional

Uniprocessor

Speedup

1.8x

7x

3.6x

Time: Moore's law

Page 9: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Ideal Multicore Scaling Process

Art of Multiprocessor Programming 10

User code

Multicore

Speedup 1.8x

7x

3.6x

Unfortunately, not so simple…

Page 10: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Actual Multicore Scaling Process

Art of Multiprocessor Programming 11

1.8x 2x 2.9x

User code

Multicore

Speedup

Parallelization and Synchronization

require great care…

Page 11: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 12

Sequential Computation

memory

object object

thread

Page 12: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 13

Concurrent Computation

memory

object object

Page 13: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 14

Asynchrony

• Sudden unpredictable delays

– Cache misses (short)

– Page faults (long)

– Scheduling quantum used up (really long)

Page 14: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 15

Model Summary

• Multiple threads

• Single shared memory

• Objects live in memory

• Unpredictable asynchronous delays

Page 15: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

17

Concurrency Jargon

• Hardware

– Processors

• Software

– Threads, processes

• Sometimes OK to confuse them,

sometimes not.

Art of Multiprocessor Programming

Page 16: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

18

Parallel Primality Testing

• Challenge

– Print primes from 1 to 1010

• Given

– Ten-processor multiprocessor

– One thread per processor

• Goal

– Get ten-fold speedup (or close)

Art of Multiprocessor Programming

Page 17: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 19

Load Balancing

• Split the work evenly

• Each thread tests range of 109

…109 10102·1091

P0 P1 P9

Page 18: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

20

Procedure for Thread i

void primePrint {

int i = ThreadID.get(); // IDs in {0..9}

for (j = i*109+1, j<(i+1)*109; j++) {

if (isPrime(j))

print(j);

}

}

Art of Multiprocessor Programming

Page 19: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

21

Issues

• Higher ranges have fewer primes

• Yet larger numbers harder to test

• Thread workloads

– Uneven

– Hard to predict

Art of Multiprocessor Programming

Page 20: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 22

Issues

• Higher ranges have fewer primes

• Yet larger numbers harder to test

• Thread workloads

– Uneven

– Hard to predict

• Need dynamic load balancing

Page 21: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 23

17

18

19

Shared Counter

each thread

takes a number

Page 22: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

24

Procedure for Thread i

int counter = new Counter(1);

void primePrint {

long j = 0;

while (j < 1010) {

j = counter.getAndIncrement();

if (isPrime(j))

print(j);

}

}

Art of Multiprocessor Programming

Page 23: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 25

Counter counter = new Counter(1);

void primePrint {

long j = 0;

while (j < 1010) {

j = counter.getAndIncrement();

if (isPrime(j))

print(j);

}

}

Procedure for Thread i

Shared counter

object

Page 24: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 26

Where Things Reside

cache

BusBus

cachecache

1

shared counter

shared

memory

void primePrint {

int i =

ThreadID.get(); // IDs

in {0..9}

for (j = i*109+1,

j<(i+1)*109; j++) {

if (isPrime(j))

print(j);

}

}

code

Local

variables

Page 25: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 27

Procedure for Thread i

Counter counter = new Counter(1);

void primePrint {

long j = 0;

while (j < 1010) {

j = counter.getAndIncrement();

if (isPrime(j))

print(j);

}

}

Stop when every

value taken

Page 26: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 28

Counter counter = new Counter(1);

void primePrint {

long j = 0;

while (j < 1010) {

j = counter.getAndIncrement();

if (isPrime(j))

print(j);

}

}

Procedure for Thread i

Increment & return each

new value

Page 27: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

29

Counter Implementation

public class Counter {

private long value;

public long getAndIncrement() {

return value++;

}

}

Art of Multiprocessor Programming

Page 28: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 30

Counter Implementation

public class Counter {

private long value;

public long getAndIncrement() {

return value++;

}

}

Page 29: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming

31

What It Means

public class Counter {

private long value;

public long getAndIncrement() {

return value++;

}

}

Page 30: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 32

What It Means

public class Counter {

private long value;

public long getAndIncrement() {

return value++;

}

}

temp = value;

value = temp + 1;

return temp;

Page 31: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 33

time

Not so good…

Value… 1

read

1

read

1

write

2

read

2

write

3

write

2

2 3 2

Page 32: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

35

Challenge

public class Counter {

private long value;

public long getAndIncrement() {

temp = value;

value = temp + 1;

return temp;

}

}

Art of Multiprocessor Programming

Page 33: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 36

Challenge

public class Counter {

private long value;

public long getAndIncrement() {

temp = value;

value = temp + 1;

return temp;

}

}

Make these steps

atomic (indivisible)

Page 34: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 37

Hardware Solution

public class Counter {

private long value;

public long getAndIncrement() {

temp = value;

value = temp + 1;

return temp;

}

} ReadModifyWrite()

instruction

Page 35: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming

38

An Aside: Java™

public class Counter {

private long value;

public long getAndIncrement() {

synchronized {

temp = value;

value = temp + 1;

}

return temp;

}

}

Page 36: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 39

An Aside: Java™

public class Counter {

private long value;

public long getAndIncrement() {

synchronized {

temp = value;

value = temp + 1;

}

return temp;

}

}Synchronized block

Page 37: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 40

An Aside: Java™

public class Counter {

private long value;

public long getAndIncrement() {

synchronized {

temp = value;

value = temp + 1;

}

return temp;

}

}

Mutual Exclusion

Page 38: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

41

Mutual Exclusion,

or “Alice & Bob share a pond”

A B

Art of Multiprocessor Programming

Page 39: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

42

Alice has a pet

A B

Art of Multiprocessor Programming

Page 40: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

43

Bob has a pet

A B

Art of Multiprocessor Programming

Page 41: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

44

The Problem

A B

The pets don't

get along

Art of Multiprocessor Programming

Page 42: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

45

Formalizing the Problem

• Two types of formal properties in

asynchronous computation:

• Safety Properties

– Nothing bad happens ever

• Liveness Properties

– Something good happens eventually

Art of Multiprocessor Programming

Page 43: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

46

Formalizing our Problem

• Mutual Exclusion

– Both pets never in pond simultaneously

– This is a safety property

• No Deadlock

– if only one wants in, it gets in

– if both want in, one gets in

– This is a liveness property

Art of Multiprocessor Programming

Page 44: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

47

Simple Protocol

• Idea

– Just look at the pond

• Gotcha

– Not atomic

– Trees obscure the view

Art of Multiprocessor Programming

Page 45: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

48

Interpretation

• Threads can't “see” what other threads

are doing

• Explicit communication required for

coordination

Art of Multiprocessor Programming

Page 46: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

49

Cell Phone Protocol

• Idea

– Bob calls Alice (or vice-versa)

• Gotcha

– Bob takes shower

– Alice recharges battery

– Bob out shopping for pet food …

Art of Multiprocessor Programming

Page 47: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

50

Interpretation

• Message-passing doesn't work

• Recipient might not be

– Listening

– There at all

• Communication must be

– Persistent (like writing)

– Not transient (like speaking)

Art of Multiprocessor Programming

Page 48: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

56

Flag Protocol

A B

Art of Multiprocessor Programming

Page 49: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

57

Alice's Protocol (sort of)

A B

Art of Multiprocessor Programming

Page 50: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

58

Bob's Protocol (sort of)

A B

Art of Multiprocessor Programming

Page 51: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

59

Alice's Protocol

• Raise flag

• Wait until Bob's flag is down

• Unleash pet

• Lower flag when pet returns

Art of Multiprocessor Programming

Page 52: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 60

Bob's Protocol

• Raise flag

• Wait until Alice's flag is down

• Unleash pet

• Lower flag when pet returns

Page 53: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

61

Bob's Protocol (2nd try)

• Raise flag

• While Alice's flag is up

– Lower flag

– Wait for Alice's flag to go down

– Raise flag

• Unleash pet

• Lower flag when pet returns

Art of Multiprocessor Programming

Page 54: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 62

Bob's Protocol

• Raise flag

• While Alice's flag is up

– Lower flag

– Wait for Alice's flag to go down

– Raise flag

• Unleash pet

• Lower flag when pet returns

Bob defers

to Alice

Page 55: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

63

The Flag Principle

• Raise the flag

• Look at other's flag

• Flag Principle:

– If each raises and looks, then

– Last to look must see both flags up

Art of Multiprocessor Programming

Page 56: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

69

Remarks

• Protocol is unfair

– Bob's pet might never get in

• Protocol uses waiting

– If Bob is eaten by his pet, Alice's pet might

never get in

Art of Multiprocessor Programming

Page 57: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

CDS.IISc.ac.in | Department of Computational and Data Sciences

73

The Fable Continues

Bob falls ill, cannot tend to the pets

She gets the pets‣ Pets get along fine

But Bob has to feed them

Producer-Consumer Problem

Art of Multiprocessor Programming

Page 58: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

75

Bob Puts Food in the Pond

A

Art of Multiprocessor Programming

Page 59: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

76

mmm…

Alice releases her pets to Feed

Bmmm…

Art of Multiprocessor Programming

Page 60: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

CDS.IISc.ac.in | Department of Computational and Data Sciences

77

Producer/Consumer

Alice and Bob can't meet‣ Bob’s disease is contagious

‣ So he puts food in the pond

‣ And later, she releases the pets

Avoid‣ Releasing pets when there's no food

‣ Putting out food if uneaten food remains

Art of Multiprocessor Programming

Page 61: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

78

Producer/Consumer

• Need a mechanism so that

– Bob lets Alice know when food has been put

out

– Alice lets Bob know when to put out more

food

Art of Multiprocessor Programming

Page 62: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

79

“Can” Solution

A B

cola

Art of Multiprocessor Programming

Page 63: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

80

Bob puts food in Pond

A B

cola

Art of Multiprocessor Programming

Page 64: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

81

Bob knocks over Can

A B

Art of Multiprocessor Programming

Page 65: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

82

Alice Releases Pets

A Byum… Byum…

Art of Multiprocessor Programming

Page 66: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

83

Alice Resets Can when Pets are

Fed

A B

cola

Art of Multiprocessor Programming

Page 67: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 84

Pseudocode

while (true) {

while (can.isUp()){};

pet.release();

pet.recapture();

can.reset();

}

Alice's code

Page 68: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 85

Pseudocode

while (true) {

while (can.isUp()){};

pet.release();

pet.recapture();

can.reset();

}

Alice's code

while (true) {

while (can.isDown()){};

pond.stockWithFood();

can.knockOver();

}

Bob's code

Page 69: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

86

Correctness

• Mutual Exclusion

– Pets and Bob never together in pond

Art of Multiprocessor Programming

Page 70: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

87

Correctness

• Mutual Exclusion

– Pets and Bob never together in pond

• No Starvation

if Bob always willing to feed, and pets always

famished, then pets eat infinitely often.

Art of Multiprocessor Programming

Page 71: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 88

Correctness

• Mutual Exclusion

– Pets and Bob never together in pond

• No Starvation

if Bob always willing to feed, and pets always famished, then pets eat infinitely often.

• Producer/Consumer

The pets never enter pond unless there is food, and Bob never provides food if there is unconsumed food.

safety

liveness

safety

Page 72: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

CDS.IISc.ac.in | Department of Computational and Data Sciences

Spin LocksAside

Art of Multiprocessor Programming

Page 73: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming90

Pseudocode

while (true) {

while (can.isUp()){};

pet.release();

pet.recapture();

can.reset();

} while (true) {

while (can.isDown()){};

pond.stockWithFood();

can.knockOver();

}

Spin Lock!Has to be

protected…

Page 74: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

What Should you do if you can’t get

a lock?

• Keep trying

– “spin” or “busy-wait”

– Good if delays are short

• Give up the processor

– Good if delays are long

– Always good on uniprocessor

Art of Multiprocessor Programming 91(1)

Page 75: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

What Should you do if you can’t get

a lock?

• Keep trying

– “spin” or “busy-wait”

– Good if delays are short

• Give up the processor

– Good if delays are long

– Always good on uniprocessor

Art of Multiprocessor Programming 92

our focus

Page 76: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Basic Spin-Lock

Art of Multiprocessor Programming 93

CS

Resets lock upon exit

spin lock

critical section

...

Page 77: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Basic Spin-Lock

Art of Multiprocessor Programming 94

CS

Resets lock upon exit

spin lock

critical section

...

…lock introduces

sequential bottleneck

Page 78: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Basic Spin-Lock

Art of Multiprocessor Programming 95

CS

Resets lock upon exit

spin lock

critical section

...

…lock suffers from contention

Page 79: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Basic Spin-Lock

Art of Multiprocessor Programming 96

CS

Resets lock upon exit

spin lock

critical section

...Notice: these are distinct

phenomena

…lock suffers from contention

Page 80: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Basic Spin-Lock

Art of Multiprocessor Programming 97

CS

Resets lock upon exit

spin lock

critical section

...

Seq Bottleneck no parallelism

…lock suffers from contention

Page 81: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Basic Spin-Lock

Art of Multiprocessor Programming 98

CS

Resets lock upon exit

spin lock

critical section

...Contention ???

…lock suffers from contention

Page 82: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Review: Test-and-Set

• Boolean value

• Test-and-set (TAS)

– Swap true with current value

– Return value tells if prior value was true or

false

• Can reset just by writing false

• TAS aka “getAndSet”

Art of Multiprocessor Programming 99

Page 83: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Review: Test-and-Set

Art of Multiprocessor Programming 100

public class AtomicBoolean {

boolean value;

public synchronized booleangetAndSet(boolean newValue) {

boolean prior = value;

value = newValue;

return prior;

}

}

(5)

Page 84: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Review: Test-and-Set

Art of Multiprocessor Programming 101

public class AtomicBoolean {

boolean value;

public synchronized booleangetAndSet(boolean newValue) {

boolean prior = value;

value = newValue;

return prior;

}

}Package

java.util.concurrent.atomic

Page 85: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Review: Test-and-Set

Art of Multiprocessor Programming 102

public class AtomicBoolean {

boolean value;

public synchronized booleangetAndSet(boolean newValue) {

boolean prior = value;

value = newValue;

return prior;

}

}

Swap old and new

values

Page 86: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Review: Test-and-Set

Art of Multiprocessor Programming 103

AtomicBoolean lock

= new AtomicBoolean(false)

boolean prior = lock.getAndSet(true)

Page 87: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Review: Test-and-Set

Art of Multiprocessor Programming 104

AtomicBoolean lock

= new AtomicBoolean(false)

boolean prior = lock.getAndSet(true)

(5)

Swapping in true is called

“test-and-set” or TAS

Page 88: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Test-and-Set Locks

• Locking

– Lock is free: value is false

– Lock is taken: value is true

• Acquire lock by calling TAS

– If result is false, you win

– If result is true, you lose

• Release lock by writing false

Art of Multiprocessor Programming 105

Page 89: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Test-and-set Lock

Art of Multiprocessor Programming 106

class TASlock {

AtomicBoolean state =

new AtomicBoolean(false);

void lock() {

while (state.getAndSet(true)) {}

}

void unlock() {

state.set(false);

}}

Page 90: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Test-and-set Lock

Art of Multiprocessor Programming 107

class TASlock {

AtomicBoolean state =

new AtomicBoolean(false);

void lock() {

while (state.getAndSet(true)) {}

}

void unlock() {

state.set(false);

}} Lock state is AtomicBoolean

Page 91: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Test-and-set Lock

Art of Multiprocessor Programming 108

class TASlock {

AtomicBoolean state =

new AtomicBoolean(false);

void lock() {

while (state.getAndSet(true)) {}

}

void unlock() {

state.set(false);

}} Keep trying until lock acquired

Page 92: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Test-and-set Lock

Art of Multiprocessor Programming 109

class TASlock {

AtomicBoolean state =

new AtomicBoolean(false);

void lock() {

while (state.getAndSet(true)) {}

}

void unlock() {

state.set(false);

}}

Release lock by resetting

state to false

Page 93: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Space Complexity

• TAS spin-lock has small “footprint”

• N thread spin-lock uses O(1) space

• As opposed to O(n) Peterson/Bakery

• How did we overcome the W(n) lower

bound?

• We used a RMW operation…

Art of Multiprocessor Programming 110

Page 94: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Performance

• Experiment

– n threads

– Increment shared counter 1 million times

• How long should it take?

• How long does it take?

Art of Multiprocessor Programming 111

Page 95: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Graph

Art of Multiprocessor Programming 112

ideal

tim

e

threads

no speedup

because of

sequential

bottleneck

Page 96: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Mystery #1

Art of Multiprocessor Programming 113

tim

e

threads

TAS lock

Ideal

What is

going

on?

Page 97: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Test-and-Test-and-Set Locks

• Lurking stage

– Wait until lock “looks” free

– Spin while read returns true (lock taken)

• Pouncing state

– As soon as lock “looks” available

– Read returns false (lock free)

– Call TAS to acquire lock

– If TAS loses, back to lurking

Art of Multiprocessor Programming 114

Page 98: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Test-and-test-and-set Lock

Art of Multiprocessor Programming 115

class TTASlock {

AtomicBoolean state =

new AtomicBoolean(false);

void lock() {

while (true) {

while (state.get()) {}

if (!state.getAndSet(true))

return;

}

}

Page 99: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Test-and-test-and-set Lock

Art of Multiprocessor Programming 116

class TTASlock {

AtomicBoolean state =

new AtomicBoolean(false);

void lock() {

while (true) {

while (state.get()) {}

if (!state.getAndSet(true))

return;

}

} Wait until lock looks free

Page 100: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Test-and-test-and-set Lock

Art of Multiprocessor Programming 117

class TTASlock {

AtomicBoolean state =

new AtomicBoolean(false);

void lock() {

while (true) {

while (state.get()) {}

if (!state.getAndSet(true))

return;

}

}

Then try to

acquire it

Page 101: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Mystery #2

Art of Multiprocessor Programming 118

TAS lock

TTAS lock

Ideal

tim

e

threads

Page 102: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Mystery

• Both

– TAS and TTAS

– Do the same thing (in our model)

• Except that

– TTAS performs much better than TAS

– Neither approaches ideal

Art of Multiprocessor Programming 119

Page 103: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Opinion

• Our memory abstraction is broken

• TAS & TTAS methods

– Are provably the same (in our model)

– Except they aren’t (in field tests)

• Need a more detailed model …

Art of Multiprocessor Programming 120

Page 104: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Simple TASLock

• TAS invalidates cache lines

• Spinners

– Miss in cache

– Go to bus

• Thread wants to release lock

– delayed behind spinners

Art of Multiprocessor Programming 121

Page 105: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Test-and-test-and-set

• Wait until lock “looks” free

– Spin on local cache

– No bus use while lock busy

• Problem: when lock is released

– Invalidation storm …

Art of Multiprocessor Programming 122

Page 106: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Local Spinning while Lock is Busy

Art of Multiprocessor Programming 123

Bus

memory

busybusybusy

busy

Page 107: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

On Release

Art of Multiprocessor Programming 124

Bus

memory

freeinvalidinvalid

free

Page 108: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

On Release

Art of Multiprocessor Programming 125

Bus

memory

freeinvalidinvalid

free

miss miss

Everyone misses,

rereads

(1)

Page 109: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

On Release

Art of Multiprocessor Programming 126

Bus

memory

freeinvalidinvalid

free

TAS(…) TAS(…)

Everyone tries TAS

(1)

Page 110: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Problems

• Everyone misses

– Reads satisfied sequentially

• Everyone does TAS

– Invalidates others’ caches

• Eventually quiesces after lock acquired

– How long does this take?

Art of Multiprocessor Programming 127

Page 111: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Quiescence Time

Art of Multiprocessor Programming 128

Increses

linearly with

the number of

processors for

bus architecturetim

e

threads

Page 112: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Mystery Explained

Art of Multiprocessor Programming 129

TAS lock

TTAS lock

Ideal

tim

e

threadsBetter than TAS

but still not as

good as ideal

Page 113: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Solution: Introduce Delay

Art of Multiprocessor Programming 130

spin locktimedr1dr2d

• If the lock looks free

• But I fail to get it

• There must be contention

• Better to back off than to collide again

Page 114: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Dynamic Example: Exponential

Backoff

If I fail to get lock

– Wait random duration before retry

– Each subsequent failure doubles expected wait

Art of Multiprocessor Programming 131

timed2d4d

spin lock

Page 115: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

CDS.IISc.ac.in | Department of Computational and Data Sciences

Concurrent Data Structures

Art of Multiprocessor Programming

Page 116: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

CDS.IISc.ac.in | Department of Computational and Data Sciences

What if you had multiple producers, consumers?

while (true) {

while (a.isLocked()){};

while (can.isUp()){};

pet.release();

pet.recapture();

can.reset();

}

Alice & Co.

while (true) {

while (b.isLocked()){};

while (can.isDown()){};

pond.stockWithFood();

can.knockOver();

}

Bob & Co.

Page 117: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

CDS.IISc.ac.in | Department of Computational and Data Sciences

Does this improve performance? Sequential bottleneck!

Art of Multiprocessor Programming

Page 118: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

152

Why do we care About

Sequential Bottlenecks?

• We want as much of the code as possible

to execute in parallel

• A larger sequential part implies reduced

performance

• Amdahl's law: this relation is not linear…

Art of Multiprocessor Programming

Eugene Amdahl

Page 119: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 153

Amdahl's Law

Speedup =1 thread execution time

N thread execution time

Page 120: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 154

Amdahl's Law

Speedup =

n

p + p) -(1

1

Page 121: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 155

Amdahl's Law

Speedup =

n

p + p) -(1

1

Parallel

fraction

Page 122: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 156

Amdahl's Law

Speedup =

n

p + p) -(1

1

Parallel

fraction

Sequential

fraction

Page 123: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 157

Amdahl's Law

Speedup =

n

p + p) -(1

1

Parallel

fraction

Sequential

fraction

Number

of

threads

Page 124: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Amdahl's Law (in practice)

Art of Multiprocessor Programming 158

Page 125: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

159

Example

• Ten processors

• 60% concurrent, 40% sequential

• How close to 10-fold speedup?

Art of Multiprocessor Programming

Page 126: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

160

Example

• Ten processors

• 60% concurrent, 40% sequential

• How close to 10-fold speedup?

1

1- 0.6 +0.6

10

Speedup = 2.17 =

Art of Multiprocessor Programming

Page 127: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

161

Example

• Ten processors

• 80% concurrent, 20% sequential

• How close to 10-fold speedup?

Art of Multiprocessor Programming

Page 128: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

162

Example

• Ten processors

• 80% concurrent, 20% sequential

• How close to 10-fold speedup?

1

1- 0.8+0.8

10

Speedup = 3.57 =

Art of Multiprocessor Programming

Page 129: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

163

Example

• Ten processors

• 90% concurrent, 10% sequential

• How close to 10-fold speedup?

Art of Multiprocessor Programming

Page 130: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 164

Example

• Ten processors

• 90% concurrent, 10% sequential

• How close to 10-fold speedup?

1

1- 0.9 +0.9

10

Speedup = 5.26 =

Page 131: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

165

Example

• Ten processors

• 99% concurrent, 01% sequential

• How close to 10-fold speedup?

Art of Multiprocessor Programming

Page 132: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor Programming 166

Example

• Ten processors

• 99% concurrent, 01% sequential

• How close to 10-fold speedup?

1

1- 0.99 +0.99

10

Speedup = 9.17 =

Page 133: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Back to Real-World

Multicore Scaling

Art of Multiprocessor Programming 167

1.8x 2x 2.9x

User code

Multicore

Speedup

Not reducing

sequential % of code

Page 134: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Shared Data Structures

75%

Unshared

25%

Shared

Coarse

Grained

Fine

Grained

75%

Unshared

25%

Shared

Page 135: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Shared Data Structures

75%

Unshared

25%

Shared

Coarse

Grained

Fine

Grained

Why only 2.9 speedup

75%

Unshared

25%

Shared

Honk!

Honk!

Honk!

Page 136: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Shared Data Structures

75%

Unshared

25%

Shared

Coarse

Grained

Fine

Grained

Why fine-grained

parallelism maters

75%

Unshared

25%

Shared

Honk!

Honk!

Honk!

Page 137: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

CDS.IISc.ac.in | Department of Computational and Data Sciences

Need for Concurrent Queues Avoid sequential bottleneck by introducing a buffer

between the producers and consumers

Producers add item to queue

Consumers consume from queue

Neither wait as long as queue is not full or empty

Art of Multiprocessor Programming

Page 138: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Concurrent Objects

Companion slides for

The Art of Multiprocessor Programming

by Maurice Herlihy & Nir Shavit

Page 139: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor

Programming

173

Concurrent Computation

memory

object object

Page 140: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor

Programming

174

Objectivism

• What is a concurrent object?

– How do we describe one?

– How do we implement one?

– How do we tell if we’re right?

Page 141: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor

Programming

175

Objectivism

• What is a concurrent object?

– How do we describe one?

– How do we tell if we’re right?

Page 142: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor

Programming

176

FIFO Queue: Enqueue Method

q.enq( )

Page 143: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor

Programming

177

FIFO Queue: Dequeue Method

q.deq()/

Page 144: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor

Programming

178

Lock-Based Queue

head

tail0

2

1

5 4

3

yx

capacity = 8

7

6

Page 145: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor

Programming

179

Lock-Based Queue

head

tail0

2

1

5 4

3

capacity = 8

7

6

Fields protected by

single shared lock

yx

Page 146: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

class LockBasedQueue<T> {

int head, tail;

T[] items;

Lock lock;

public LockBasedQueue(int capacity) {

head = 0; tail = 0;

lock = new ReentrantLock();

items = (T[]) new Object[capacity];

}

Art of Multiprocessor

Programming

180

A Lock-Based Queue

Fields protected by

single shared lock

0 1

capacity-12

head tail

y z

Page 147: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor

Programming

181

Lock-Based Queue

head

tail

0

2

1

5 4

3

Initially: head = tail

7

6

Page 148: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

class LockBasedQueue<T> {

int head, tail;

T[] items;

Lock lock;

public LockBasedQueue(int capacity) {

head = 0; tail = 0;

lock = new ReentrantLock();

items = (T[]) new Object[capacity];

}

Art of Multiprocessor

Programming

182

A Lock-Based Queue

Initially head = tail

0 1

capacity-12

head tail

y z

Page 149: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor

Programming

183

Lock-Based deq()

head

tail0

2

5 4

7

36

1

yx

Page 150: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor

Programming

184

Acquire Lock

head

tail0

2

5 4

7

36

yx

1

Waiting to

enqueue…

My turn …yx

Page 151: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

public T deq() throws EmptyException {

lock.lock();

try {

if (tail == head)

throw new EmptyException();

T x = items[head % items.length];

head++;

return x;

} finally {

lock.unlock();

}

}

Art of Multiprocessor

Programming

185

Implementation: deq()

Acquire lock at

method start

0 1

capacity-12

head tail

y z

Page 152: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor

Programming

186

Check if Non-Empty

head

tail0

2

5 4

7

36

1

yx

Waiting to

enqueue…

Not equal?

Page 153: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

public T deq() throws EmptyException {

lock.lock();

try {

if (tail == head)

throw new EmptyException();

T x = items[head % items.length];

head++;

return x;

} finally {

lock.unlock();

}

}

Art of Multiprocessor

Programming

187

Implementation: deq()

If queue empty

throw exception

0 1

capacity-12

head tail

y z

Page 154: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor

Programming

188

Modify the Queue

head

tail0

2

1

5 4

7

36

head

Waiting to

enqueue…

yx

Page 155: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

public T deq() throws EmptyException {

lock.lock();

try {

if (tail == head)

throw new EmptyException();

T x = items[head % items.length];

head++;

return x;

} finally {

lock.unlock();

}

}

Art of Multiprocessor

Programming

189

Implementation: deq()

Queue not empty?

Remove item and update head

0 1

capacity-12

head tail

y z

Page 156: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

public T deq() throws EmptyException {

lock.lock();

try {

if (tail == head)

throw new EmptyException();

T x = items[head % items.length];

head++;

return x;

} finally {

lock.unlock();

}

}

Art of Multiprocessor

Programming

190

Implementation: deq()

Return result

0 1

capacity-12

head tail

y z

Page 157: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor

Programming

191

Release the Lock

tail0

2

1

5 4

7

36

y

x

head

Waiting…

Page 158: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor

Programming

192

Release the Lock

tail0

2

1

5 4

7

36

y

x

head

My turn!

Page 159: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

public T deq() throws EmptyException {

lock.lock();

try {

if (tail == head)

throw new EmptyException();

T x = items[head % items.length];

head++;

return x;

} finally {

lock.unlock();

}

}

Art of Multiprocessor

Programming

193

Implementation: deq()

Release lock no

matter what!

0 1

capacity-12

head tail

y z

Page 160: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Implementation: enq()

Art of Multiprocessor Programming

public void enq(Item ) throws EmptyException {

lock.lock();

try {

if (tail-head == capacity) throw

new FullException();

items[tail % capacity] = x;

tail++;

} finally {

lock.unlock();

}

}

0 1

capacity-12

head tail

y z

Page 161: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor

Programming

195

Wait-free Queue?

0 1

capacity-12

head tail

y z

public class WaitFreeQueue {

int head = 0, tail = 0;

items = (T[]) new Object[capacity];

public void enq(Item x) {

if (tail-head == capacity) throw

new FullException();

items[tail % capacity] = x; tail++;

}

public Item deq() {

if (tail == head) throw

new EmptyException();

Item item = items[head % capacity]; head++;

return item;

}}

0 1

capacity-12

head tail

y z

Page 162: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor

Programming

196

Linearizability

• Each method should– “take effect”

– Instantaneously

– Between invocation and response events

• Object is correct if this “sequential” behavior is correct

• Any such concurrent object is– Linearizable™

• A linearizable object: one all of whose possible executions are linearizable

Page 163: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

public class WaitFreeQueue {

int head = 0, tail = 0;

items = (T[]) new Object[capacity];

public void enq(Item x) {

if (tail-head == capacity) throw

new FullException();

items[tail % capacity] = x; tail++;

}

public Item deq() {

if (tail == head) throw

new EmptyException();

Item item = items[head % capacity]; head++;

return item;

}}Art of Multiprocessor

Programming

197

Wait-free Queue?

Linearization order is

order head and tail

fields modified

Page 164: Introduction to Multiprocessor Synchronizationcds.iisc.ac.in/wp-content/uploads/DS286.AUG2016.L27-28...CDS.IISc.ac.in | Department of Computational and Data Sciences Turing Cluster

Art of Multiprocessor

Programming

198

Reasoning About

Linearizability: Locking public T deq() throws EmptyException {

lock.lock();

try {

if (tail == head)

throw new EmptyException();

T x = items[head % items.length];

head++;

return x;

} finally {

lock.unlock();

}

}

Linearization points

are when locks are

released

0 1

capacity-12

head tail

y z