Synchronization patterns Classical problems Deadlock ... · allow only one task to proceed, others must wait several tasks compete to acquire a lock only one wins (acquires the lock)

Università degli studi di Udine Sistemi operativi – Operating Systems

Synchronization


Synchronization

� Synchronization primitives

� HW primitives

� Atomic operations

� Low-level synchronization primitives

� Exclusive locks, rwlocks, seq. locks, non-blocking data structures

� Locking strategies and issues

� High-level synchronization primitives

� Synchronization patterns

� Classical problems

� Deadlock management


Concurrency

� Multiple applications (multiprogramming)

� independent application

� processes unaware of others

� competition on shared resources

� cooperating application

� processes indirectly aware of others

� cooperation by sharing resources

� synchronization

� Parallel applications

� processes/threads directly aware of others

� cooperation by communication (messages or shared variables)

� synchronization


Concurrence issues

� Race conditions

� final results depend on execution order

� Starvation

� some task waits indefinitely

� Deadlock

� a circular waiting dependency prevents work to proceed


Race condition

� Results depend on the order of the execution

a=a+b

process A

b=a+b

process B

shared vars

a=1 ; b=2

a=? ; b=?a=3 ; b=5

a=4 ; b=3

a=3 ; b=3


Race condition

� Results depend on the order of the execution

local tmpA

tmpA=count

tmpA=tmpA+1

count=tmpA

process A

local tmpB

tmpB=count

tmpB=tmpB+1

count=tmpB

process B

shared var

count=0

count=?count=2 OK

count=1 NO


Mutual exclusion

� Group of instructions must be executed atomically

local tmpA

tmpA=count

tmpA=tmpA+1

count=tmpA

process A

local tmpB

tmpB=count

tmpB=tmpB+1

count=tmpB

process B

shared var

count=0

count=2

BeginSection / Lock

EndSection / Unlock

Critical

Section

BeginSection / Lock

EndSection / Unlock

Critical

Section


Starvation

D C B A

Execute

E

ready processes RUN

D E C B

Execute

A

ready processes RUN


Deadlock

Task A Task B

Task D Task C

wait

wait

waitwait

Wrong synchronization!

System is blocked!


Synchronization

Synchronization primitives


Synchronization

� HW primitives

� processor instructions

� usually not privileged

� Low-level synchronization primitives

� built on top of HW primitives

� do not require scheduler intervention� can be implemented at user level

� High-level synchronization primitives

� built on top of low-level primitives

� interact with scheduler� from user level, imply syscalls


Synchronization

HW primitives



HW primitives

� Atomic Read, atomic Write� not practical

� requires N accesses to synchronize N tasks� requires unique IDs for tasks

� Atomic Read-Modify-Write� allows to implement simple spin-locks

� implementation independent of involved tasks� no "unique IDs" requirement

� clever implementation can reduce contention� ticket, array-based, queue-based locks

� Atomic Read-Test-Modify-Write� allows wait-free synchronization

� Load-Link and Store-Conditional� do not require a double memory access in a single instruction


HW primitives

� Atomic Read-Modify-Write

� minimal feature to implement practical locks

� Test-and-Set

� Read-and-Increment

� x86: lock xadd

� Exchange

� x86: xchg

� ARM:swp

� deprecated since ARMv6

� Others:

� fetch_and_sub, fetch_and_or, ...

int Test-and-Set(int *ptr){ int old = *ptr; *ptr = 1; return old;}

int Read-and-Increment(int *ptr, int inc){ int old = *ptr; *ptr = old + incr; return old;}

int Exchange(int *ptr, int new){ int old = *ptr; *ptr = new; return old;}

pseudo-code

atomic

atomic

atomic


HW primitives

� Atomic Read-Test-Modify-Write

� allows wait-free and lock-free synchronization

� Compare-and-Exchange or Compare-and-Swap (CAS)

� x86: lock cmpxchg

int Compare-Exchange(int *ptr, int testval, int new){ int old = *ptr; if (old == testval) *ptr = new; return old;}

pseudo-code

atomic


HW primitives

� Load-Link and Store-Conditional

� do not require a double memory access in a single instruction

� MIPS:

� ll, sc

� ARM:

� ldrex, strex

int LL(int *ptr){ remember this access return *ptr;}

int SC(int *ptr, int val){ if (this cpu has executed LL on ptr) { if (*ptr written since the last LL performed by this cpu) return SC_FAILURE; /* fail */ else { /* *ptr has not changed */ *ptr = val; return SC_SUCCESS; /* success */ } } unspecified behavior}

pseudo-code

atomic

atomic


� Load-Link and Store-Conditional

� do not require a double memory access in a single instruction

� MIPS:

� ll, sc

� ARM:

� ldrex, strex

HW primitives

atomic

pseudo-code

atomic

LL x

Modify x

SC x

failure � operations not atomic: retry

LL x

Modify x

SC x

success � operations was atomic: go on

PROCESSOR A PROCESSOR B

Atomic Read-Modify-Write


HW primitives: summary

� Atomic accesses:

� Read-Modify-Write operations

� fetch_and_add, fetch_and_sub, fetch_and_or, fetch_and_and, ...� perform the operation suggested by the name, and return the old value

� swap, add_and_fetch, sub_and_fetch, or_and_fetch,

and_and_fetch, ...� perform the operation suggested by the name, and return the new value

� Read-Test-Modify-Write operations

� compare_and_swap


Operation costs

� Typical values:

� Best-case Atomic increment: 50 – 100 cycles

� Best-case Compare-and-Exchange: 50 – 100 cycles� CAS on a variable in cache

� Memory barrier: 100 – 150 cycles

� Single cache miss: 200 – 300 cycles

� Compare-and-Exchange cache miss: 500 – 1000 cycles


Synchronization

Low-level

synchronization primitives



Low-level synchronization primitives

� Exclusive locks

� Reader-Writer locks

� Sequential locks

� Non-blocking data structures



� Exclusive locks

� allow only one task to proceed, others must wait

� several tasks compete to acquire a lock

� only one wins (acquires the lock)

� others wait until the lock is released

� e.g.,

� entering in a critical section � lock acquisition

� exiting the critical section � lock releasing


Exclusive lock

� Binary variable

� States:

� locked (or acquired, or held),

� unlocked (or free, or available)

� Operations

� lock(lock_var)

if lock_var is unlocked then lock_var becomes locked

else the calling task cannot proceed until lock_var becomes unlocked

� unlock(lock_var)

lock_var becomes unlocked

note: unlock should be called by the task that holds the lock


Exclusive lock implementations

� Classical locking algorithms

� Dekker's algorithm

� Peterson's algorithm

� Lamport's bakery algorithm

� Spinlocks

� Polling on a variable

� Basic implementation

� Ticket spinlock

� Array spinlock


Exclusive lock implementations

� Implementation

� based on Atomic Read and atomic Write

� Classical locking algorithms

� Dekker's algorithm

� Peterson's algorithm

� Lamport's bakery algorithm

� based on

Atomic Read-Modify-Write, or

Atomic Read-Test-Modify-Write, or

Load-Link and Store-Conditional

� spinlock

Needs unique task id (no id reuse)

Must read N memory locations

Requires sequential consistency

Limited to 2 tasks

Requires processor consistency


Classical locking algorithms

� Lock algorithm proprieties

� Mutual exclusion (safety property)

� critical sections of different threads do not overlap� cannot guarantee integrity of computation without this property!

� No deadlock

� if someone attempts to acquire the lock, then someone will acquire it� does not imply deadlock-free programs

� No starvation

� every thread that attempts to acquire the lock eventually succeeds

� implies no deadlock

� desirable but not essential� practical locks: many permit starvation, if it is unlikely to occur

� without a real-time guarantee, starvation freedom is weak



� Dekker's algorithm (1964)

� for 2 tasks

� Peterson's algorithm (1981)

� for 2 tasks

� generalizable to N tasks (filter algorithm)

� Lamport's “bakery” algorithm (1974)

� for N tasks



� Use atomic load and store only, no stronger atomic primitives

� Not used in practice

� locks based on stronger atomic primitives are more efficient

� Why study classical lock algorithms?

� understand the principles underlying synchronization

� ubiquitous in parallel programs

� appreciate their subtlety

� motivate the need for hardware


Wrong algorithm - 1

#define N 2 /* number of processes */int flag[N]; /* initialized to all 0s */

void lock(int process /* 0 or 1 */){ int other = 1 - process; flag[process] = 1; while (flag[other] == 1) ; /* wait */}

void unlock(int process /* 0 or 1*/){ flag[process] = 0;}

I'm interested


Wrong algorithm - 1

int other = 1 – process;

flag[process] = 1;

while (flag[other] == 1) ; /* wait */

....

critical section

...

flag[process] = 0;

task A (process=0)


flag[process] = 1;


....

critical section

...

flag[process] = 0;

task B (process=1)

unlock

OK


Wrong algorithm - 1


flag[process] = 1;


task A (process=0)


flag[process] = 1;


task B (process=1)

lockedlocked

deadlock


Wrong algorithm - 2

#define N 2 /* number of processes */int turn = 0; /* who has prenoted access */

void lock(int process /* 0 or 1 */){ turn = process; while (turn == process) ; /* wait */}

void unlock(int process /* 0 or 1*/){}

other goes first


Wrong algorithm - 2

turn = process;while (turn == process) ; /* wait */

....

critical section

...


....

critical section

...

task A (process=0)


....

critical section

...

task B (process=1)

unlock

OK

unlock


Dekker's algorithm

#define N 2 /* number of processes */int flag[N]; /* initialized to all 0s */int turn = 0; /* who has prenoted access */

void lock(int process /* 0 or 1 */){ int other = 1-process; flag[process] = 1; while flag[other] { flag[process] = 0; while (turn != process) ; /* wait */ flag[process] = 1; }}

void unlock(int process /* 0 or 1*/){ turn = 1-process; /* other */ flag[process] = 0;}


Peterson's algorithm

#define N 2 /* number of processes */int flag[N]; /* initialized to all 0s */int turn = 0; /* who has prenoted access */

void lock(int process /* 0 or 1 */){ int other = 1-process; flag[process] = 1; turn=process; while( turn == process && flag[other] == 1 ) ; /* wait */ }

void unlock(int process /* 0 or 1*/){ flag[process] = 0;}

I'm interestedother

goes

first



other = 1-process;flag[process] = 1;turn=process;while(turn==process && flag[other]==1);

....

critical section

...

flag[process] = 0;

task A (process=0)

other = 1-process;flag[process] = 1;turn=process;while(turn==process && flag[other]==1);

....

critical section

...

flag[process] = 0

task B (process=1)OK

unlock



other = 1-process;flag[process] = 1;turn=process;

while(turn==process && flag[other]==1);

....

critical section

...

flag[process] = 0;

task A (process=0)

other = 1-process;flag[process] = 1;

turn=process;

while(turn==process && flag[other]==1);

....

critical section

...

flag[process] = 0

task B (process=1)OK

unlock


Lamport's bakery algorithm

� On arrival get a (incremental) ticket

� The bakery serves who has the smallest ticket

10 11 12 138

9

7

6

arriving tasks

tickets

waiting tasks

served

task

now serving:

6


Lamport's bakery algorithm

int flag[N]; /* initialized to all 0s */int ticket[N]; /* initialized to all 0s */ lock(int process) { int j; flag[process] = 1; ticket[process] = 1 + max(ticket[0], ..., ticket[N-1]); flag[process] = 0; for (j = 0; j < N; j++) { while (flag[j]) ; /* wait if task-j is getting its ticket */ /* Wait for threads with higher priority */ while ( ticket[j] != 0 && ( (ticket[j]<ticket[process]) || ((ticket[j]==ticket[process]) && j<process)) ) ; /* wait */ }}

void unlock(int process){ ticket[process] = 0;}


Observations

� Bakery algorithm is concise, elegant and fair

� Why is it not practical?

� must read N distinct locations (N could be very large)

� threads must be assigned unique IDs between 0 and N-1

� awkward for dynamic threads

� value of a label is monotonically increasing and unbounded�

� There can exist a more clever lock using only atomic

load/store that avoids these problems?

� No. Any deadlock-free algorithm requires reading or writing

at least N distinct locations in the worst case.


Synchronization

Spinlocks




Spinlocks

� Repeatedly check the lock variable

� loop until is locked

� at kernel level: disable preemption

� Note: on uni-processor systems, just disable preemption

� clever implementation can reduce contention

� ticket locks

� array-based locks

� queue-based locks

� whenever possible, processor is turned in a low-power state

when waiting

� e.g., with a wfe or wfi in ARM


Spinlock implementation (example)

void lock(int *lck){ while(Test-and-Set(lck) == 1) { continue; /* wait! */ } /* memory barrier if needed */}

void unlock(int *lck){ /* memory barrier if needed */ *lck = 0;}

void lock(int *lck){ while(Exchange(lck, 1) == 1) { continue; /* wait! */ } /* memory barrier if needed */}


Leveraging Test-and-Set

Leveraging Exchange



void lock(int *lck){ while ((*lck == 1) || (Test-and-Set(lck) == 1)) { continue; /* wait! */ } /* memory barrier if needed */}


void lock(int *lck){ do { while (*lck == 1) { continue; /* wait! */ } } while(Exchange(lck, 1) == 1); /* memory barrier if needed */}


Leveraging Test-and-Set

(reducing communication)

Leveraging Exchange

(reducing communication)



void lock(int *lck){ /* * write code here * */

/* memory barrier if needed */}


Using LL and SC

write code here


Spinlock implementation (ticket)

� Same principle of Lamport's bakery algorithm

� arriving tasks get a ticket

� atomically

� there is a global indicator: the current turn

� each task waits until current turn is equal to its own ticket

� a leaving task increments the current turn



typedef struct { int next; int current;} lock_t;

void init_lock(lock_t *lck){ lck->next = lck->current = 0;}

void lock(volatile int *lck){ int myturn;

myturn = /* get a ticket */

/* wait until it's my turn */

}

void unlock(int *lck){

/* increment current turn */

}

current turn

must be an atomic operation

atomicity not required:

only one task here

next available ticket






myturn = fetch_and_add(lck->next, 1);

while (myturn != lck->current) continue; /* wait! */ /* memory barrier */}

void unlock(int *lck){ /* memory barrier */ lck->current++; /* memory barrier */}

atomically acquire current value

and store a new value for the field

next

loop until the field next becomes

equal to the field owner

lck must be volatile for this test

for efficiency: the previous write

become visible on all CPUs as

soon as possible: spinning is

reduced







while (myturn != lck->current) delay(myturn - lck->current); /* memory barrier */}


to reduce contention

(delay can be a simple empty loop)

With a back-off delay







while (myturn != lck->current) wait_for_event(); /* memory barrier */}

void unlock(int *lck){ /* memory barrier */ lck->current++; /* memory barrier */ send_event();}

to reduce contention

(if there is architectural support)

With sleeping

needed






while (myturn != lck->current) continue; /* wait! */ /* memory barrier */}


implement fetch_and_add

using LL and SC



�Only 1 atomic instruction executed per lock acquisition

�Fair, locks granted in order of request: no starvation

�Back off delay proportional to position in queue

� if time in critical section is constant, the delay can be calculated

such that the subsequent test of lck->current will just succeed

�Polling on a single shared location

� bus traffic with an invalidate cache coherency protocol (e.g., MESI)

� delay not necessary with a write-update protocol (e.g., Firefly)


Spinlock implementation (array-based)

� Each task must poll a different location

� arriving tasks get an index

� atomically

� each task waits until current its own lock becomes free

� a leaving task unlocks the following one



LockedLocked

UnlockedUnlocked

LockedLocked

LockedLocked

LockedLocked

LockedLocked

LockedLocked

LockedLocked

LockedLockedidxidx

Task 1:release lock

Task 2:acquired lock

Task 3:waiting



typedef struct { int flags[N]; int queuelast, winner_idx;} lock_t;

void init_lock(lock_t *lck){ int i; lck->flags[0] = HAS_LOCK; for (i=1; i<N; i++) lck->flags[i] = MUST_WAIT; lck->queuelast = 0;}

void lock(volatile int *lck){ int myplace;

myplace = fetch_and_add(lck->queuelast, 1);

while (lck->flags[myplace % N] == MUST_WAIT) continue; lck->winner_idx = myplace; /* memory barrier */}

void unlock(int *lck){ /* memory barrier */ lck->flags[lck->winner_idx % N] = MUST_WAIT; lck->flags[(lck->winner_idx + 1) % N] = HAS_LOCK; /* memory barrier */}

should be padded: each element in

a different cache line

N must be a power of 2

get a location to poll

(each task obtains a different index)

only a task here: record the index used

(needed in unlock)

allows the next waiting task to proceed


�Tasks do not poll a single shared location

� reduced bus traffic for a write-invalidate cache coherency protocol

�Lock is passed from a task to the next

� through a shared slot in an array

� this slot is not shared with any other thread

�Only 1 atomic instruction executed per lock acquisition

�Fair, lock is granted in order of request: no starvation

�Need to know max number of threads



Spinlocks

�Applicable to any number of tasks

�Applicable to any number of processors (shared memory)

�Simple

� thus easy to verify

�Support multiple critical sections

� each critical section is identified by its own lock variable


Spinlocks

� Process waits by executing a loop

� Can be implemented at user level

� no syscalls are required by user level code

� CPU time is wasted


Synchronization

Reader-writer locks





� Reader-Writer locks

� 2 categories of tasks: readers and writers� readers can proceed concurrently� a writer must have exclusive access

� increase parallelism� readers advance in parallel� a new reader can proceed if other readers are accessing data

� tasks must be specialized� readers do not modify data!

� writers may starve� a writer must wait until there are no more readers, but a new reader

can steal the waiting writers turn� give priority to writers � increased complexity (thus overhead)

W


Reader-Writer lock

� 3-state variable

� States:� unlocked� reader_locked� writer_locked

� Operations

� read_lock(lock_var)

if lock_var is writer_locked the calling task cannot proceed

else lock_var becomes (or stays) reader_locked

� write_lock(lock_var)

if lock_var is unlocked then lock_var becomes writer_locked

else the calling task cannot proceed

� unlock(lock_var) lock_var is reader_locked and no more readers � lock_var becomes unlocked

lock_var is writer_locked � lock_var becomes unlocked


RWlock implementation (example)

const W = 1;const R = 2;

typedef int lock_t;

void init_lock(lock_t *lck){ *lck = 0;}

void read_lock(volatile lock_t *lck) { fetch_and_add(lck, R); while(lck & W) continue;}

void write_lock(lock_t *lck) { while(CAS(lck, 0, W) != 0) continue;}

void read_unlock(lock_t *lck) { fetch_and_add(lck, -R);}

void write_unlock(lock_t *lck) { fetch_and_add(lck, -W);}

�Simple

�Not efficient

� Polling CAS

�Not fair

� Readers are preferred

� Writers can starve


Reader-Writer lock

� Variants

� more states

� VAX/VMS Distributed Lock Manager: 6-state lock

� states: Unlocked, Concurrent-Read, Concurrent-Write,

Protected-Read, Protected-Write, Exclusive

� DBMS: even more than 30 states!

��

��

��

Result:

: allowed

: blocked



� Sequential locks

� Similar to reader-writer locks but writers have priority

� a writer is never blocked by readers

� writers do not starve

� a writer is only serialized with respect to other writers

� readers try to get data

� operation is restarted if a conflict with a writer is detected

do { seq = read_seqbegin(&foo); ...} while (read_seqretry(&foo, seq));

write_seqlock(&test_seqlock);... /* update data */write_sequnlock(&test_seqlock);

Reader Writer

Example


Synchronization

Locking

strategies and issues




Locking strategies

� Giant lock

� the whole code (e.g., a library) is protected with a single lock

� simplest approach

� allows to port non-parallel code in parallel architectures

� available parallelism is lost


Locking strategies

� Coarse-grained locking

� code is split in subsystems

� e.g., for an OS kernel

� filesystems

� memory management

� network stack

� video drivers

� input drivers

� ...

� each subsystem is protected with its own lock

� calls to different subsystem can proceed concurrently

� communication between different subsystems can still require

a global lock


Locking strategies

� Fine-grained locking

� locks protect individual data structures

� scalable

� several locks must be managed

� need to understand which locks are required

� order on locks requests

� management of a hierarchy of locks

� rules!


Locking issues

� Deadlock

� circular waiting dependency that prevents work to proceed

� tasks blocked on a lock held by a task waiting for another lock held...

� Convoying

� set of tasks repeatedly competing for a lock

� progression speed is limited by the slowest task

� fast tasks are forced to slow down

� similar to a column of cars in a single lane


Locking issues

� Priority Inversion

� a high priority task (TH) blocked on a lock held by a low priority

task (TL)

� an independent medium priority task (TM) is ready

� � TM is scheduled to run� � TH obtains an actual lower priority than TM

� Workarounds

� disable preemption when a lock is held� requires disabling interrupts

� priority ceiling� give the highest priority to a task that holds a lock

� priority inheritance� �task is blocked on a lock its priority passes to the lock owner (if higher)


Locking issues

� Signal-safety

� signal handlers (and interrupt handlers) cannot share locks

with the other code

� e.g.,

1. task1 holds lockA

2. task1 is interrupted by a signal

3. signal handler requires lockA

� signal handler blocks, task1 cannot proceed

� � deadlock

� disable signals (or interrupts) when lockA is acquired

� not required for locks that are not used in signal handlers too


Locking issues

� Kill-tolerant availability

� tasks killed while holding a lock

� Pre-emption tolerance

� tasks pre-empted while holding a lock


Locking issues

� Overall performance

� overhead of lock primitives

� global communication

� memory barriers

� depends on lock contention

� non-contended lock is stored only in a CPU cache� still, not for free: memory barrier

� contended locks bounce from a cache to other caches� cache misses

� look for efficient algorithms

� use specialized locks

� e.g., reader-writer locks


Synchronization

Non-blocking data

structures





� Non-blocking data structures

� Lock-free data structures

� e.g. lock-free linked lists

� see Linux llist

� others: buffer, stack, queue, map, snapshot

� Wait-free data structures

� much harder than lock-free

� not always possible


Lock- and Wait-free synchronization

� Lock-free synchronization:

� At least one thread will make progress in finite time

� A data structure is lock-free if and only if some operation

completes after a finite number of steps system-wide have

been executed on the structure

� Wait-free synchronization:

� Every thread will make progress in finite time

� A data structure is wait-free if and only if every operation on

the structure completes after it has executed a finite

number of steps


Lock-free stack (example)

struct NodeType {

Datatype data;

struct NodeType *next;

};

struct NodeType *Head;

void init() {

Head = NULL;

}

void push(struct NodeType *n) {

n->next = Head;

Head = n;

}

struct NodeType *pop() {

struct NodeType *n;

n = Head;

if (n != NULL)

Head = n->next;

return n;

}

Head

NULL

top of the stackGlobal data is changed here

If nobody else has changed global data

� changes are valid

otherwise, abort and retry

Not concurrent:

a lock is needed to make push and

pop atomic


Lock-free stack (example)

struct NodeType {

Datatype data;


};


void init() {

Head = NULL;

}


do {

n->next = Head;

} while (CAS(&Head, n->next, n) != n->next);

}


struct NodeType *n;

do {

n = Head;

} while (n != NULL && CAS(&Head, n, n->next) != n);

return n;

}

Lock free

Head

NULL

top of the stack


Lock-Free issues

� Designing generalized lock-free algorithms is hard

� � Design lock-free data structures instead� buffer, list, stack, queue, map, deque, snapshot

� ABA problem

� typical lock-free operation� task1:

1. acquire atomically a flag (finds the value A)

2. use data

3. test the current value of flag� �if A data not changed: ok to proceed; else, repeat the operation

� problem:� after task1.1, task2 stores B to flag

� before task1.3, task2 changes data and store A to flag� � task1 is not aware of changes

� data inconsistency


Lock-free stack (example): ABA

struct NodeType {

Datatype data;


};


void init() {

Head = NULL;

}


do {

n->next = Head;

} while (CAS(&Head, n->next, n) != n->next);

}


struct NodeType *n, *next;

do {

n = Head; next = n->next;

} while (n != NULL && CAS(&Head, n, next) != n);

return n;

}

Lock free

Head

A B C NULL

top of the stack



TASK1: TASK2:

n = Head;

n1 = Head;

CAS(&Head, n1, n1->next)

n2 = Head;


n1->next = Head;

CAS(&Head, n1->next, n1)

CAS(&Head, n, n->next)

Head

A B C NULL

n

n1

n2

n = pop();

n1 = pop();

n2 = pop();

push(n1);



TASK1: TASK2:

n = Head;

n1 = Head;


n2 = Head;


n1->next = Head;



Head

A B C NULL

n

n1

n2

n = pop();

n1 = pop();

n2 = pop();

push(n1);



TASK1: TASK2:

n = Head;

next = n->next;

n1 = Head;


n2 = Head;


n1->next = Head;



Head

A B C NULL

top of the stack

n = pop();

n1 = pop();

n2 = pop();

push(n1);



TASK1: TASK2:

n = Head;

next = n->next;

n1 = Head;


n2 = Head;


n1->next = Head;



n = pop();

n1 = pop();

n2 = pop();

push(n1);

Head

A B C NULL

n

n1

n2

next

CAS is successful



� Do not reuse nodes

� task2:

n1 = pop(); n1 = pop();

n2 = pop(); n2 = pop();

push(n1); n3 = new node

n3.data = n1.data;

push(n3);

� When can n1 be freed?

� after n1 is released, another task can obtain that memory as a new

node

� � ABA can happen


ABA solutions

� Deferred reclamation

� Do not reuse nodes

� Don't recycle the memory “too soon”

� Garbage collector

� Hazard pointers

� Read-Copy-Update

� Use the same CAS for 2 pointers

� needs a double-word CAS

� Tagged pointers

� some bits of a pointer are used as a counter

� beware the wrap-around


Lock-free list (example)

struct NodeType {

Datatype data;


};

struct NodeType *Head, *Tail;

void init() {

NodeType *dummynode;

dummynode = malloc(sizeof struct NodeType);

dummynode->next = NULL;

Head = Tail = dummynode;

}

void insert(struct NodeType *n) {

struct NodeType *tmp;

n->next = NULL;

tmp = Tail;

tmp->next = n;

Tail = n;

}

struct NodeType *remove() {

struct NodeType *n;

n = Head->next;

if (n != NULL) {

Head = n;

}

return n;

}

Head

NULL

dummy node

Tail

first node

Not concurrent

discard dummy node;

n becomes the new dummy node Not concurrent:


pop atomic



struct NodeType {

Datatype data;


};


void init() {





}


struct NodeType *tmp;

n->next = NULL;

tmp = Tail;

tmp->next = n;

Tail = n;

}


struct NodeType *n;

n = Head->next;

if (n != NULL) {

Head = n;

}

return n;

}

Head

NULL

dummy node

Tail

first node

Not concurrent:


pop atomic

after every step, the list must remain consistent:

- nodes are all linked

- Head points to the dummy node

- Tail is after Head

concurrent tasks must “cooperate”



Head

NULL

dummy node

Tail

first node

struct NodeType {

Datatype data;


};


void init() {





}


struct NodeType *tmp, *ntmp;

n->next = NULL;

do {

tmp = Tail;

ntmp = tmp->next;

if (Tail != tmp) continue;

if (ntmp != NULL) {

CAS(&Tail, tmp, tmp->next);

continue;

}

} while (CAS(&tmp->next, NULL, n) != NULL);

CAS(&Tail, tmp, n);

}


struct NodeType *n, *h, *t;

do {

h = Head;

t = Tail;

n = h->next;

if (Head != h) continue;

if (n == NULL)

break;

if (h == t) {

CAS(&Tail, t, n);

continue;

}

} while (CAS(&Head, h, n) != h);

return n;

}Lock free


Synchronization

High-level




� Semaphores

� Semaphores (Counting semaphores)

� Binary semaphores

� Mutexes

� Condition variables

� Monitors

� Deferred processing

� e.g., Read-Copy-Update (RCU)

High-level synchronization primitives


� Semaphore

� Integer variable

� Operations (all atomic)

� initialize

� set the initial value� an arbitrary non-negative value

� semWait (also: P)

� decrement value; if the result is negative, then suspend the calling processif suspended, the process is stored on a list associated to the semaphore

� used to enter in a critical section

� semSignal (also: V)

� increment value; if the result is non-positive, then resume a suspended processthe process to be resumed is read from the list associated to the semaphore

� used to leave a critical section



Semaphores

� Strong semaphore

� task are resumed in FIFO order

� fair implementation

� Weak semaphore

� no order is imposed on task reactivations


Semaphores

� Binary semaphore

� Semaphore that can only assume values 0 or 1

� initialize

� Only 0 or 1 are valid initial values

� semSignal

� if value is 0, then resume a waiting process (if any)the process to be resumed is read from the list associated to the semaphore

� semWait

� if value is 0, then suspend process, else decrement valueif suspended, the process is stored on a list associated to the semaphore


Semaphore implementation (example)

typedef struct semaphore_t { int count; int lock; QUEUE suspended;} semaphore;

void semWait(semaphore *sem){ lock(&sem->lock); sem->count--; if (sem->count < 0) { place this process in sem->suspended unlock(&sem->lock); suspend this process } else { unlock(&sem->lock); }}

void semSignal(semaphore *sem){ lock(&sem->lock); sem->count++; if ( sem->count <= 0 ) { remove a process P from sem->suspended place process P on the ready list } unlock(&sem->lock);}

kernel-level

operations

access to

sem must

be atomic

spinlock

protected

section



� Mutex

� Similar to a binary semaphore but

only the task owning the mutex can unlock it

� The same semantic of low-level locks

� but scheduler is into play

� Reentrant (or recursive) mutex

� a task can acquire the mutex multiple times

� multiple levels of ownership

� must be released the same number of times



� Monitor

� abstract data type

� accessible only through “access procedures” (all atomic and exclusive)

� Only a task can access the monitor at a time

� Object oriented approach

� e.g., in C++ a monitor can be implemented with a class where:

� there is a reentrant mutex as a field

� all methods get the mutex on entry

� all methods release the mutex on exit

� signaling is realized with explicit condition variables


� Condition variables

� Condition to test

� Operations (all atomic)

� cond_wait

� sleep until another task calls signal or broadcast

� cond_notify (also: signal)

� wake up a waiting task

� cond_notifyAll (also: broadcast)

� wake up all waiting tasks



Condition variables

� Using condition variables

� on waiting: use a loop

� condition set by signaling task may be no more true

� another task could have changed the condition after the signaling

� MESA semantic

� Hoare semantic:

� after the signaling the waiting thread is woken up

� nobody else gets control on the condition

� hard to implement (never used in practice)

� a lock (or a mutex) is required

� prevents the race condition:

� sequence: test (done is 0), set (done becomes 1), cond_signal, cond_wait

� the signaling is lost � the waiting thread is never woken up

task A

task B


Condition variables pseudocode (example 1)

typedef struct { lock_t lock; QUEUE waiting;} cond_t;

void cond_wait(cond_t *cond_var){ atomically add this task to cond_var->waiting unlock(&cond_var->lock); suspend this task lock(&cond_var->lock);}

void cond_notify(cond_t *cond_var){ atomically remove a task T from cond_var->waiting resume T (place T on the ready list)}

void cond_notify_all(cond_tr *cond_var){ for each task T in cond_var->waiting { atomically remove a task T from cond_var->waiting resume T (place T on the ready list) }}


Condition variables (example 1)

/* do something */

/* need to wait for task B */

lock(&cv->lock);while (done == 0) { cond_wait(cv);}unlock(&cv->lock);

Task A

/* do something */

lock(&cv->lock);done = 1;cond_notify(cv);unlock(&cv->lock);

/* now task A can advance */

Task B

test on condition and

call to cond_wait

must be atomic

changes on condition and

call to cond_notify

must be atomic

protected by the same lock

done is initially is 0


Condition variables pseudocode (example 2)

typedef struct { QUEUE waiting;} cond_t;

void cond_wait(cond_t *cond_var, lock_t *lck){ atomically add this task to cond_var->waiting unlock(lck); suspend this task lock(lck);}

void cond_notify(cond_t *cond_var, lock_t *lck){ atomically remove a task T from cond_var->waiting resume T (place T on the ready list)}

void cond_notify_all(cond_t *cond_var, lock_t *lck){ for each task T in cond_var->waiting { atomically remove a task T from cond_var->waiting resume T (place T on the ready list) }}


Condition variables (example 2)

/* do something */

/* need to wait for task B */

lock(&lck);while (done == 0) { cond_wait(&cv, &lck);}unlock(&lck);

Task A

/* do something */

lock(&lck);done = 1;cond_notify(&cv, &lck);unlock(&lck);

/* now task A can advance */

Task B

test on condition and

call to cond_wait

must be atomic

changes on condition and

call to cond_notify

must be atomic

protected by the same lock

done is initially is 0


Read-Copy Update

� Synchronization for read-mostly data

� Update is split in:

� removal

� reclamation

� Publish-Subscribe Mechanism

� Simple to apply to data structures

� lists, arrays


Read-Copy Update

� Update is split in:

� removal

� remove the reference to old data

� reclamation

� free memory

� � removal does not need to wait for running readers

� � reclamation must wait until readers have done


Read-Copy Update

� Publish-Subscribe Mechanism

� subscribe data

� rcu_dereference

� publish new data

� rcu_assign_pointer

� old data is “reclaimed” when is no more needed

� after a “grace” period


Readers-Writers with RCU

data

dataptr

p1

data2

rcu_read_lock();

p1 = rcu_dereference(dataptr);

... do something with data

rcu_read_unlock();

Reader(s)

p2

lock(&writers_lock);

... prepare new data pointed by p2

oldp = dataptr;rcu_assign_pointer(dataptr, p2);

unlock(&writers_lock);

synchronize_rcu();

... free data pointed by oldp

Writer(s)

p1 is only valid between rcu_read_lock

and rcu_read_unlock



� Readerrcu_read_lock(); Reader signals its arrival

p1 = rcu_dereference(dataptr); Reader gets a reference to data


rcu_read_unlock(); Reader signals its leaving

� Writerlock(&writers_lock); Writer synchronizes with other writers


oldp = dataptr;

rcu_assign_pointer(dataptr, p2); Writers “publishes” new data

unlock(&writers_lock); Writer synchronizes with other writers

synchronize_rcu(); Writer waits until “active” readers complete

... free data pointed by oldp Old data is no more needed (can be freed)

No blocking

operations

here!



� Readerrcu_read_lock();

p1 = rcu_dereference(dataptr);


rcu_read_unlock();

� Writerlock(&writers_lock);


oldp = dataptr;rcu_assign_pointer(dataptr, p2);

unlock(&writers_lock);

call_rcu(..., reclaim_func);

Called asynchronously when readers have completedreclaim_func(...){

... free data}

To avoid waiting on writers

No blocking

operations

here!


Read-Copy Update

�Performance

� Readers

� do not acquire locks

� do not perform atomic instructions

� do not need memory barriers (but for Alpha)

�Deadlock immunity

�Realtime latency


Read-Copy Update

�Readers and Updaters run concurrently

� Readers can obtain old data

�Low-priority RCU readers can block high-priority

Reclaimers

�Grace-period latencies can extend for many milliseconds


Synchronization

Synchronization patterns



� Signaling

� instruction (or instructions block) A1 must be executed before B1

instruction A1;

semSignal(sem);

task A

semWait(sem);

instruction B1;

task B

sem is initialized with 0



� Mutual exclusion (Mutex)

� A1 and B1 cannot overlap

semWait(sem_mutex);

instruction A1;

semSignal(sem_mutex);

task A

semWait(sem_mutex);

instruction A1;

semSignal(sem_mutex);

task B

sem_mutex is initialized with 1



� Multiplex

� generalized mutex

� no more than k tasks can access to critical section

� same structure of mutex (initialize sem_mutex with k)



instruction A1;

semSignal(semBgo);semWait(semAgo);

instruction A2;

task A

instruction B1;

semSignal(semAgo);semWait(semBgo);

instruction B2;

task B

semAgo and semBgo are initialized with 0

� Rendezvous

� both A1 and B1 must be executed before A2 and B2



instruction A1_before;

barrier(B,k)

instruction A1_after;

task A1

instruction A2_before;

barrier(B,k)

instruction A2_after;

task A2

instruction Ak_before;

barrier(B,k)

instruction Ak_after;

task Ak

� Barrier

� generalized rendezvous (to k tasks)

� use a barrier object

� implemented on top of semaphores


Barriers

An implementationtypedef struct barr_t { int arrived; semaphore mutex, sem;} barr;

void barrier(barr b, int n_proc){ semWait(b.mutex); tmp = ++b.arrived; semSignal(b.mutex);

if (tmp != n_proc) { semWait(b.sem); semSignal(b.sem); } else { semWait(b.mutex); b.arrived = 0; semSignal(b.mutex); semSignal(b.sem); }}

arrived and sem are initialized with 0 ; mutex is initialized with 1

not reusable (after all tasks leaved, sem==1)

an additional semWait(sem) is needed


Barriers

V

phase 1

phase 2

� Reusable barrier

� where the final semWait should be issued?

� after all tasks leaved the barrier

� otherwise one waiting task will not resume

� before tasks leave the barrier

� otherwise a task can reenter the barrier before the final semWait


Barriers

An implementationtypedef struct barr_t { int arrived; semaphore mutex, phase1, phase2;} barr;

void barrier(barr b, int n_proc){ semWait(b.mutex); tmp = ++b.arrived; semSignal(b.mutex);

if (tmp != n_proc) { semWait(b.phase1); semSignal(b.phase1); } else { semWait(b.phase2); semSignal(b.phase1); }

arrived and phase1 are initialized with 0 ;

mutex and phase2 are initialized with 1

semWait(b.mutex); tmp = --b.arrived; semSignal(b.mutex);

if (tmp != 0) { semWait(b.phase2); semSignal(b.phase2); } else { semWait(b.phase1); semSignal(b.phase2); }}phase 1

phase 2

semaphore phase2 needs an

additional semWait too

use phase 2 to issue the additional

semWait to put the semaphore phase1

at its initial value


Synchronization

Classical problems


Classical problems

� Use semaphores to solve:

� Producer – Consumer (with a bounded buffer)

� Readers – Writers

� no priority

� no-starve writers

� writers with priority

� Dining philosophers


Producer – Consumer

� Some tasks produce data

� Some tasks consume data

� Data are consumed in the same order they are produced

� The queue size is known and limited (e.g., a circular buffer)

Producers Consumers

FIFO



� Grant exclusive access to the queue

� Producer signals to consumers that a new data is ready

� Consumer signals that space is available in queue

semWait(space);semWait(mutex);queue.insert(data);semSignal(mutex);semSignal(inqueue);

Producer

semWait(inqueue);semWait(mutex);data = queue.get();semSignal(mutex);semSignal(space);

Consumer

mutex is initialized with 1

inqueue is initialized with 0 (initial data into queue)

space is initialized with queue size (initial room into queue)




Producer

semWait(mutex);semWait(inqueue);data = queue.get();semSignal(mutex);semSignal(space);

Consumer

DEADLOCK

��

� swap semWaits? WRONG� consumer waits into a critical section� producer cannot pass the critical section

� producer cannot send semSignal to consumer



� swap semSignal?

� No deadlock

� Additional context switches can occur


Producer

semWait(inqueue);semWait(mutex);data = queue.get();semSignal(space);semSignal(mutex);

Consumer


inqueue is initialized with 0 (initial data into queue)

space is initialized with queue size (initial room into queue)

non-optimal implementation



� Using conditional variables and mutexes

� Grant exclusive access to the queue

� Producer signals to consumers that a new data is ready

� Consumer signals that space is available in queue

mutex_lock(mutex);while(count == MAX) cond_wait(space, mutex);queue.insert(data); count++;cond_signal(datain, mutex);mutex_unlock(mutex);

Producer

mutex_lock(mutex);while(count == 0) cond_wait(datain, mutex);data = queue.get(); count--;cond_signal(space, mutex);mutex_unlock(mutex);

Consumer


count is initialized with 0 (initial data into queue)

datain is used to signal that there is some data in the queue

space is used to signal that there is free space in the queue


Readers – Writers

Shared data

Writers

Readers

� Some tasks write in a shared area

� Some tasks read from the shared area

� No order must be enforced

� Data integrity must be preserved� do not read half-written data


Readers – Writers - 1

semWait(noWriters);write datasemSignal(noWriters);

Writer

semWait(mutex);if (readers==0) semWait(noWriters);readers++;semSignal(mutex);

read data

semWait(mutex);readers--;if (readers==0) semSignal(noWriters);semSignal(mutex);

Reader

noWriters is initialized with 1 (initially none is accessing to shared data)


readers is initialized with 0 (initially no readers are reading)

� Grant exclusive access for writers

� First arriving reader must signal that data is used (and wait for writer)

� Last leaving reader must signal that none is using data



� Writers has less chance to get data� possible starvation

� Solution:� do not allow incoming readers to access data until waiting

writers have been served

Shared data

Writer (waiting)

ReadersIncoming readers



semWait(writer_in);semWait(noWriters);write datasemSignal(noWriters);semSignal(writer_in);

Writer

semWait(writer_in);semSignal(writer_in);


read data


Reader

noWriters is initialized with 1 ; mutex is initialized with 1 ; readers is initialized with 0

writer_in is initialized with 1 (readers and writers can try to proceed)

� Block readers when a writer is waiting

� Resume readers when a writer finishes

� Readers must not hold the writer_in semaphore



semWait(writer_in);semWait(noWriters);write datasemSignal(noWriters);semSignal(writer_in);

Writer

semWait(writer_in);semSignal(writer_in);


read data


Reader

� readers and writers wait on writer_in

� one writer or one reader is selected

� how to grant priority to writers?



semWait(mutexW);if (writers==0) semWait(noReaders);writers++;semSignal(mutexW);

semWait(noWriters);write datasemSignal(noWriters);

semWait(mutexW);writers--;if (writers==0) semSignal(noReaders);semSignal(mutexW);

Writer

semWait(noReaders);

semWait(mutexR);if (readers==0) semWait(noWriters);readers++;semSignal(mutexR);

semSignal(noReaders);

read data

semWait(mutexR);readers--;if (readers==0) semSignal(noWriters);semSignal(mutexR);

Reader

writers wait here when readers are into (so new readers are blocked)

readers wait here when a writer is into

(so incoming writers can still block readers)

noWriters is initialized with 1 ; mutexR and mutexW are initialized with 1 ; readers is initialized with 0

noReaders is initialized with 1 ; writers is initialized with 0


Readers – Writers

� Readers – Writers – 1

� Simple

� Writers can starve


� No starvation for writers


� Writers have priority over readers


Dining philosophers

0

14

23

01

2

3

4

for (;;) { think(); get_forks(); eat(); release_forks();}

philosopher

non-critical section

critical section

� Only one philosopher can hold a fork at a time.

� No deadlock.

� No starvation.

� Allows more eating philosopher at the same time

� Five plates� one for each philosopher

� Five forks

� To eat two forks are needed


Dining philosophers

for (;;) { think();

/* get_forks(); */ semWait(fork[(i+1)%5]); semWait(fork[i]);

eat();

/* release_forks(); */ semSignal(fork[(i+1)%5]); semSignal(fork[i]);}

philosopher

fork[5] are initialized with 1


Dining philosophers

� Deadlock

think();

semWait(fork[1]);semWait(fork[0]);

eat();

semSignal(fork[1]);semSignal(fork[0]);

philosopher 0

think();


eat();


philosopher 1

think();


eat();


philosopher 2

think();

semWait(fork[4);semWait(fork[3]);

eat();


philosopher 3

think();


eat();


philosopher 4


Dining philosophers

� Allow only 4 philosophers to try to acquire forks

think();


eat();


philosopher 0

think();


eat();


philosopher 1

think();


eat();


philosopher 2

think();


eat();


philosopher 3

think();


eat();


philosopher 4


Dining philosophers

� Symmetric solution

for (;;) { think();

/* get_forks(); */ semWait(mutex4); semWait(fork[(i+1)%5]); semWait(fork[i]);

eat();

/* release_forks(); */ semSignal(fork[(i+1)%5]); semSignal(fork[i]); semSignal(mutex4);}

philosopher

fork[5] are initialized with 1

mutex4 is initialized with 4


Dining philosophers

� Asymmetric solution

for (;;) { think();

/* get_forks(); */ semWait(fork[(i+1)%5]); semWait(fork[i]);

eat();


L-philosopher

fork[5] are initialized with 1 ; there is at least one L-philosopher and one R-philosopher

for (;;) { think();

/* get_forks(); */ semWait(fork[i]); semWait(fork[(i+1)%5]);

eat();


R-philosopher


Dining philosophers

� No more than 4 philosophers can try to acquire forks

think();


eat();


L-philosopher 0

think();


eat();


L-philosopher 1

think();


eat();


L-philosopher 2

think();


eat();


L-philosopher 3

think();


eat();


R-philosopher 4

ph 4 is blocked by ph 3


Dining philosophers

think();


eat();


L-philosopher 0

think();


eat();


L-philosopher 1

think();


eat();


L-philosopher 2

think();


eat();


L-philosopher 3

think();


eat();


R-philosopher 4

think();


eat();


think();


eat();


think();


eat();


think();


eat();


think();


eat();






� No more than 4 philosophers can try to acquire forks


Synchronization

Deadlock management


Deadlock conditions

� Mutual exclusion.� A resource can be assigned only at a fixed finite number of processes at a

time. No other processes may access a resource unit that has reached the

maximum number of assignations.� Needed (to enforce synchronization)

� No preemption.� No resource can be forcibly removed from a process holding it.� Difficult to avoid (a rollback is needed to implement resource preemption)

� Hold and wait.� A process may hold allocated resources while awaiting assignment of other

resources.

� Circular wait.� A closed chain of processes exists, such that each process holds at least one

resource needed by the next process in the chain

Deadlock is possible


Resources

� Swappable space

� Devices

� physical drives

� files

� Main memory blocks

� Internal resources

� I/O interrupts handling

Required asynchronously by

independent processes


Resource allocation graph

B R1

Task requires Resource

Task Resource

Task holds Resource

Task Resource

A B has to wait until A will release R1

B

R1

A

R2

circular dependency: deadlock


Deadlock handling

� Prevention

� make deadlock not possible

� Avoidance

� disallow operations that may lead to a deadlock

� Detection

� periodically check for deadlock and recover


Deadlock prevention

� Disallow hold-and-wait� all the needed resources must be required simultaneously� a process is blocked until all the required resources are available� inefficient

� a process must acquire resources needed only for small time intervals or actually not

needed

� Allow preemption� when a request is refused, a process must release all its resources� OS may request a process to release resources� practical only for resources with an easily restored state (e.g., processor)

� Disallow circular waits� define an ordering on resources� a process that owns a resource R can request a resource Q only if ord(R) < ord(Q)� disallows incremental resource request


Deadlock avoidance

� Evaluate resource requests

� grant a resource request only if a deadlock cannot occur

� OS must know all the future requests

� banker's algorithm (Dijkstra)


Deadlock detection

� Periodically check for deadlock

� grant resource requests whenever possible

� if a deadlock is detected

� kill all deadlocked processes

� most common approach

� successively abort deadlocked processes (until deadlock no longer exist)

� selection order can be a key factor

� rollback all deadlocked processes to a previous state

� backup and restore mechanism must be implemented

� force a deadlocked process to release resources

� preemption

� rollback the process to a point prior the resource acquisition


Banker's algorithm

� For a single resource type

� Process

� resources used

� resources needed

� Available resources

� grant request only if it will lead to a safe state

� safe state:

� there exist at least one process that still needs less resources than available

� unsafe state

� deadlock is possible (no-deadlock cannot be ensured)


Banker's algorithm

Process A 0 15

Allocated Needed

Process B 0 7

Process C 0 4

Process D 0 12

Available 20

Process A 8 7

Allocated Needed

Process B 4 3

Process C 1 3

Process D 6 6

Available 1

unsafe: with available resources no process

is guaranteed to terminate



Banker's algorithm

Process A 0 15

Allocated Needed

Process B 0 7

Process C 0 4

Process D 0 12

Available 20

Process A 7 8

Allocated Needed

Process B 4 3

Process C 2 2

Process D 5 7

Available 2

safe: with available resources process C can

surely terminate



Banker's algorithm


� safe state:

� � i � Needed(i) < Available

Needed(i): resources still needed by process i

Available: resources still available on the system


Banker's algorithm

� For several resource types

� replicate information for each resource type

Process A 0 15

Allocated Needed

Process B 0 7

Process C 0 4

Process D 0 12

Available 20

0 1

Allocated Needed

0 2

0 4

0 1

Available 4

0 3

Allocated Needed

0 7

0 4

0 9

Available 10

Type-1 Type-2 Type-3

� safe state:

� i �� j Needed(i,j) < Available(j)

Needed(i,j): resources of type j still needed by process i

Available(j): resources of type j still available on the system


Synchronization

User level

(POSIX)



GCC builtins for atomic accesses


� __sync_fetch_and_add(type *ptr, type value);

� __sync_fetch_and_sub(type *ptr, type value);

� __sync_fetch_and_or(type *ptr, type value);

� __sync_fetch_and_and(type *ptr, type value);

� __sync_fetch_and_xor(type *ptr, type value);

� __sync_fetch_and_nand(type *ptr, type value);

� perform the operation suggested by the name, and return the old

value;

� imply a full memory barrier




� __sync_add_and_fetch(type *ptr, type value);

� __sync_sub_and_fetch(type *ptr, type value);

� __sync_or_and_fetch(type *ptr, type value);

� __sync_and_and_fetch(type *ptr, type value);

� __sync_xor_and_fetch(type *ptr, type value);

� __sync_nand_and_fetch(type *ptr, type value);

� perform the operation suggested by the name, and return the new

value;





� __sync_lock_test_and_set(type *ptr, type value);

� perform an atomic exchange: writes value into *ptr and returns the previous

contents of *ptr;

� implies an acquire barrier

� Read-Test-Modify-Write operations

� __sync_val_compare_and_swap(type *ptr, type oldval, type newval);

� __sync_bool_compare_and_swap(type *ptr, type oldval, type newval);

� perform atomic compare-and-swap: if the current value of *ptr is oldval,

then write newval into *ptr;

� __sync_val_compare_and_swap returns the old value of *ptr

� __sync_bool_compare_and_swap returns true if the comparison is successful




� Others:

� __sync_lock_release(type *ptr);

� Writes 0 to *ptr;

� implies a release barrier

� __sync_synchronize();

� Issues a full memory barrier


User level (POSIX)


� Low-level primitives

� spinlocks

� High-level primitives

� semaphores

� mutexes

� reader-writer locks

� condition variables

� barriers


User level (POSIX)


� Low-level primitives

� spinlocks

� type

� pthread_spinlock_t

� operations:

� pthread_spin_init

� pthread_spin_destroy

� pthread_spin_lock

� pthread_spin_unlock

� pthread_spin_trylock

initialization

deallocation

locking

unlocking

tentative locking


User level (POSIX)



� semaphores

� type

� sem_t

� operations:

� sem_init

� sem_destroy

� sem_getvalue

� sem_wait

� sem_timedwait

� sem_trywait

� sem_post

initialization

deallocation

waiting

unlocking

tentative waiting


User level (POSIX)



� mutexes

� type

� pthread_mutex_t

� operations:

� pthread_mutex_init

� pthread_mutex_destroy

� pthread_mutex_lock

� pthread_mutex_unlock

� pthread_mutex_trylock

initialization

deallocation

locking

unlocking

tentative locking


User level (POSIX)



� reader-writer locks

� type

� pthread_rwlock_t

� operations:

� pthread_rwlock_init

� pthread_rwlock_destroy

� pthread_rwlock_rdlock

� pthread_rwlock_wrlock

� pthread_rwlock_unlock

� pthread_rwlock_tryrdlock

� pthread_rwlock_trywrlock

initialization

deallocation

locking

unlocking

tentative locking


User level (POSIX)



� condition variables

� type

� pthread_cond_t

� operations:

� pthread_cond_init

� pthread_cond_destroy

� pthread_cond_wait

� pthread_cond_timedwait

� pthread_cond_signal

� pthread_cond_broadcast

initialization

deallocation

waiting

notifying


User level (POSIX)



� barriers

� type

� pthread_barrier_t

� operations:

� pthread_barrier_init

� pthread_barrier_destroy

� pthread_barrier_wait

initialization

deallocation

waiting


Lock pthread_spinlock_t

RW-Lock NO

Mutex pthread_mutex_t

Semaphore sem_t

RW-mutex pthread_rwlock_t

Condition Variable pthread_cond_t

Barrier pthread_barrier_t

Low-level

(no sleeping)

High-level

(may sleep)

User level (POSIX)


Synchronization patterns Classical problems Deadlock ... · allow only one task to proceed, others must wait several tasks compete to acquire a lock only one wins (acquires the lock)

Documents