Top Banner
Local-Spin Algorithms Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler This presentation is based on the book “Synchronization Algorithms and Concurrent Programming” by G. Taubenfeld and on a the survey “Shared-memory mutual exclusion: major research trends since 1986” by J. Anderson, Y-J. Kim and T. Herman
23

Local-Spin Algorithms Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

Dec 16, 2015

Download

Documents

Herbert Caron
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Local-Spin Algorithms Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

Local-Spin Algorithms

Multiprocessor synchronization

algorithms (20225241)

Lecturer: Danny Hendler

This presentation is based on the book “Synchronization Algorithms and Concurrent Programming” by G. Taubenfeld and on a the survey “Shared-memory mutual exclusion: major research trends since 1986” by J. Anderson, Y-J. Kim and T. Herman

Page 2: Local-Spin Algorithms Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

The CC and DSM models

This figure is taken from the survey “Shared-memory mutual exclusion: major research trends since 1986” by J. Anderson, Y-J. Kim and T. Herman

Page 3: Local-Spin Algorithms Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

Remote and local memory accesses

In a DSM system: local

remote

In a Cache-coherent system:

An access of v by p is remote if it is the first access or if v has been written by another process since p’s last access of it.

Page 4: Local-Spin Algorithms Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

Local-spin algorithmsIn a local-spin algorithm, all busy waiting

(‘await’) is done by read-only loops of local-accesses, that do not cause

interconnect traffic.

The same algorithm may be local-spin on one architecture (DSM/CC) and non-local spin

on the other!

For local-spin algorithms, our complexity metric is the worst-case number of Remote

Memory References (RMRs)

Page 5: Local-Spin Algorithms Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

Peterson’s 2-process algorithm

Program for process 1

1. b[1]:=true2. turn:=13. await (b[0]=false or

turn=0)4. CS5. b[1]:=false

Program for process 0

1. b[0]:=true2. turn:=03. await (b[1]=false or

turn=1)4. CS5. b[1]:=false

Is this algorithm local-spin on a DSM machine?No

Is this algorithm local-spin on a CC machine?Yes

Page 6: Local-Spin Algorithms Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

Peterson’s 2-process algorithm

Program for process 1

1. b[1]:=true2. turn:=13. await (b[0]=false or

turn=0)4. CS5. b[1]:=false

Program for process 0

1. b[0]:=true2. turn:=03. await (b[1]=false or

turn=1)4. CS5. b[0]:=false

What is the RMR complexity on a DSM machine?

Unbounded

What is the RMR complexity on a CC machine?Constant

Page 7: Local-Spin Algorithms Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

Kessel’s single-writer algorithm

Program for process 0

1. b[0]:=true2. local[0]:=turn[1]3. turn[0]:=local[0]4. Await (b[1]=false or

local[0]<>turn[1])5. CS6. b[0]:=false

Program for process 1

1. b[1]:=true2. local[1]:=1-turn[0]3. turn[1]:=local[1]4. Await (b[0]=false or

local[1]=turn[0])5. CS6. b[1]:=false

Can Kessel’s algorithm be made local-spin on a DSM machine?Yes, if:

b[1], turn[1] are located at p0’s memory module

b[0], turn[0] are located at p1’s memory module

Page 8: Local-Spin Algorithms Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

Local Spinning Mutual ExclusionUsing Strong Primitives

Page 9: Local-Spin Algorithms Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

Anderson’s queue-based algorithmShared:integer ticket – A RMW object, initially 0bit valid[0..n-1], initially valid[0]=1 and valid[i]=0, for i{1,..,n-1}

Local:integer myTicket

Program for process i1. myTicket=fetch-and-inc-modulo-n(ticket) ; take a ticket2. await valid[myTicket]=1 ; wait for your turn3. CS4. valid[myTicket]:=0 ; dequeue5. valid[myTicket+1 mod n]:=1 ; signal successor

0 1 2 3 n-1

valid 1 0

1

0 0 0 0

ticket

Page 10: Local-Spin Algorithms Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

Anderson’s queue-based algorithm (cont’d)

0ticket

valid 1 0 0 0 0

Initial configuration

1ticket

valid 1 0 0 0 0

After entry section of p3

0myTicket3

After p1 performs entry section

2ticket

valid 1 0 0 0 0

0myTicket3

1myTicket1

2ticket

valid 0 1 0 0 0

After p3 exits

1myTicket1

Page 11: Local-Spin Algorithms Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

Anderson’s queue-based algorithm (cont’d)

What is the RMR complexity on a DSM machine?

Unbounded

What is the RMR complexity on a CC machine?Constant

Program for process i1. myTicket=fetch-and-inc-modulo-n(ticket) ; take a ticket2. await valid[myTicket]=1 ; wait for your turn3. CS4. valid[myTicket]:=0 ; dequeue5. valid[myTicket+1 mod n]:=1 ; signal successor

Page 12: Local-Spin Algorithms Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

Graunke and Thakkar’s algorithm

Uses the more common swap primitive:

swap(w, new)do atomically prev:=w w:=new return prev

Page 13: Local-Spin Algorithms Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

Graunke and Thakkar’s algorithm (cont’d)Shared:bit slots[0..n-1], initially slots[i]=1, for i{0,..,n-1}

structure {bit value, bit *node} tail, initially {0, &slots[0]}

Local:structure {bit value, bit *node} myRecord, prevbit temp

0

tail

1 1 1 1 1

0 2 3 n-11

slots

Page 14: Local-Spin Algorithms Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

Graunke and Thakkar’s algorithm (cont’d)Shared:bit slots[0..n-1], initially slots[i]=1, for i{0,..,n-1}

structure {bit value, bit* slot} tail, initially {0, &slot[0]}

Local:structure {bit value, bit* node} myRecord, prev, bit temp

Program for process i1. myRecord.value:=slots[i] ; prepare to thread yourself to queue2. myRecord.slot:=&slots[i]3. prev=swap(&tail, myRecord) ; prev now points to predecessor4. await (*prev.slot ≠prev.value) ;local spin until predecessor’s value changes5. CS6. temp:=1-slots[i]7. slots[i]:=temp ; signal successor

Page 15: Local-Spin Algorithms Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

Graunke and Thakkar’s algorithm (cont’d)

Page 16: Local-Spin Algorithms Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

Graunke and Thakkar’s algorithm (cont’d)

What is the RMR complexity on a DSM machine?

Unbounded

What is the RMR complexity on a CC machine?Constant

Program for process i1. myRecord.value:=slots[i] ; prepare to thread yourself to queue2. myRecord.slot:=&slots[i]3. prev=swap(&tail, myRecord) ; prev now points to predecessor4. await (*prev.slot ≠prev.value) ;local spin until predecessor’s value changes5. CS6. temp:=1-slots[i]7. slots[i]:=temp ; signal successor

Page 17: Local-Spin Algorithms Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

The MCS queue-based algorithm

Type:Qnode: structure {bit locked, Qnode *next}Shared:Qnode nodes[0..n-1]

Qnode *tail initially nil

Local:Qnode *myNode, initially &nodes[i]Qnode *prev, *successor

Has constant RMR complexity under both the DSM and CC models

Uses swap and CAS

Page 18: Local-Spin Algorithms Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

The MCS queue-based algorithm (cont’d)

Program for process i1. myNode.next := nil ; prepare to be last in queue2. prev := myNode ;prepare to thread yourself3. swap(&tail, prev) ;tail now points to myNode4. if (prev ≠ nil) ;I need to wait for a predecessor5. *myNode.locked := true ;prepare to wait6. *prev.next := myNode ;let my predecessor know it has to unlock me7. await myNode.locked := false8. CS9. if (myNode.next = nil) ; if not sure there is a successor 10. if (compare-and-swap(tail, myNode, nil) = false) ; if there is a

successor11. await (myNode->next ≠ null) ; spin until successor let me know its

identity12. successor := myNode->next ; get a pointer to my successor13. successor->locked := false ; unlock my successor14. else ; for sure, I have a successor15. successor := myNode->next ; get a pointer to my successor16. successor->locked := false ; unlock my successor

Page 19: Local-Spin Algorithms Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

The MCS queue-based algorithm (cont’d)

Page 20: Local-Spin Algorithms Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

Local Spinning Mutual ExclusionUsing reads and writes

Page 21: Local-Spin Algorithms Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

A local-spin tournament-tree algorithm(Anderson, Yang, 1993)

O(log n) RMR complexity for both DSM and CC systems

This is `suspected’ to be optimal!

Uses O(n log n) registers

0

0 1

0 1 2 3

0 1 2 3 4 5 6 7

Level 0

Level 1

Level 2

Processes

Each node is identified by

(level, number)

Page 22: Local-Spin Algorithms Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

A local-spin tournament-tree algorithm (cont’d)

Shared:- Per each node, v, there are 3 registers: name[level, 2node], name[level, 2node+1] initially -1 turn[level, node]

- Per each level l and process i, a spin flag: flag[level, i]

Local:level, node, id

Page 23: Local-Spin Algorithms Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler This presentation is based on the book “Synchronization.

A local-spin tournament-tree algorithm (cont’d)Program for process i1. id:=i2. For level = o to log n-1 do ;from leaf to root3. node:= id/2 ;the current node4. name[level, 2node+(id mod 2)]:=i ;identify yourself5. turn[level,node]:=id ;update the tie-breaker6. flag[level, i]:=0 ;initialize the locally-accessible spin flag7. if (even(id))8. rival:=name[level, id+1]9. else10. rival:=name[level, id-1]11. if ( (rival ≠ -1) and (turn[level, node] = i) ) ;if not sure I should precede rival12. if (flag[level, rival] =0)13. flag[level, rival]:=1 ;release the rival from waiting 14. await flag[level, i] ≠ 0 ;await until sure the rival updated the tie-breaker15. if (turn[level,node]=i) ;if I lost16. await flag[level,i]=2 ;wait till rival notifies me its my turn17. id:=node ;move to the next level18. CS19. for level=log n –1 downto 0 do ;begin exit code20. id:= i/2level , node:= id/2 ;set node and id21. name[level, 2node+(id mod 2]) :=-1 ;erase name22. rival := turn[level,node] ;find who rival is (if there is one)23. if rival ≠ i ;if there is a rival24. flag[level,rival] :=2 ;notify rival