1 Based on : The art of multiprocessor programming Maurice Herlihy and Nir Shavit, 2008 • Appendix A – Software Basics • Appendix B – Hardware Basics Introduction to Concurrent Programming Software & Hardware Basics Slides by Ofer Givoli
Jan 21, 2016
1
Based on:The art of multiprocessor programmingMaurice Herlihy and Nir Shavit, 2008• Appendix A – Software Basics• Appendix B – Hardware Basics
Introduction to Concurrent ProgrammingSoftware & Hardware Basics
Slides by Ofer Givoli
2
Software Basics
3
Threads in Java
• Executes a single, sequential program
• Subclass of: java.lang.Thread
4
…
Taken from: The art of multiprocessor programming, by Maurice Herlihy and Nir Shavit, 2008 (modified)
5
Monitors
• lock + waiting set
• every object is a monitor
• Critical section: using the synchronized keyword.
• Waiting: using the wait()method• Waking-up waiting threads, using the methods:
• notify()• notifyAll()
6
public class ConcurrentStack<T> {
private Stack<T> innerStack = new Stack<T>();
public void push(T obj) { innerStack.push(obj); }
public T pop() { return innerStack.pop(); }}
ConcurrentStack<Integer> s = ...s.push(1);
... = s.pop(); ... = s.pop();
Solution: mutual exclusion
7
... = s.pop(); ... = s.pop();
public class ConcurrentStack<T> {
private Stack<T> innerStack = new Stack<T>(); private Object monitor = new Object();
public void push(T obj) { synchronized(monitor) { innerStack.push(obj); } }
public T pop() { synchronized(monitor) { return innerStack.pop(); } }}
BLOCKED
8
public class ConcurrentStack<T> {
private Stack<T> innerStack = new Stack<T>(); private Object monitor = new Object();
public void push(T obj) { synchronized(monitor) { innerStack.push(obj); } }
public T pop() { synchronized(monitor) { return innerStack.pop(); } }}
9
public class ConcurrentStack<T> {
private Stack<T> innerStack = new Stack<T>();
public void push(T obj) { synchronized(this) { innerStack.push(obj); } }
public T pop() { synchronized(this) { return innerStack.pop(); } }}
10
public class ConcurrentStack<T> {
private Stack<T> innerStack = new Stack<T>();
public synchronized void push(T obj) { innerStack.push(obj); }
public synchronized T pop() { return innerStack.pop(); }}
New feature: waiting for pop()
11
public class ConcurrentStack<T> {
private Stack<T> innerStack = new Stack<T>();
public synchronized void push(T obj) { innerStack.push(obj); }
public synchronized T pop() { while (innerStack.empty()) {} return innerStack.pop(); }}
Problem?
12
... = s.pop();
public class ConcurrentStack<T> {
private Stack<T> innerStack = new Stack<T>();
public synchronized void push(T obj) { innerStack.push(obj); }
public synchronized T pop() { while (innerStack.empty()) {} return innerStack.pop(); }}
s.push(1);
BLOCKED
deadlock
13
public class ConcurrentStack<T> {
private Stack<T> innerStack = new Stack<T>();
public synchronized void push(T obj) { innerStack.push(obj); }
public synchronized T pop() { while (innerStack.empty()) {} return innerStack.pop(); }}
14
public class ConcurrentStack<T> {
private Stack<T> innerStack = new Stack<T>();
public synchronized void push(T obj) { if (innerStack.empty()) notifyAll(); innerStack.push(obj); }
public synchronized T pop() { while (innerStack.empty()) {wait();} return innerStack.pop(); }}
15
... = s.pop(); s.push(1);
BLOCKEDWAITING
public class ConcurrentStack<T> {
private Stack<T> innerStack = new Stack<T>();
public synchronized void push(T obj) { if (innerStack.empty()) notifyAll(); innerStack.push(obj); }
public synchronized T pop() { while (innerStack.empty()) {wait();} return innerStack.pop(); }}
BLOCKED
16
... = s.pop();
WAITING
... = s.pop();
WAITING
public class ConcurrentStack<T> {
private Stack<T> innerStack = new Stack<T>();
public synchronized void push(T obj) { if (innerStack.empty()) notify(); innerStack.push(obj); }
public synchronized T pop() { while (innerStack.empty()) {wait();} return innerStack.pop(); }}
s.push(1);s.push(2);
lost wakeup
17
Thread.yield();
Thread.sleep(t);
18
Thread-Local Objects
class ThreadLocallD extends ThreadLocal<Integer> { protected Integer initialValue() { return …; }}
ThreadLocallD id = …;
id.set(…);…… = id.get();
id.set(…);…… = id.get();
19
• Synchronization in C#• Pthreads
20
Hardware Basics
21
Taken from: https://software.intel.com/en-us/articles/optimizing-applications-for-numa
22
L1Cache
Speed: Fastest Slowest Size: Smallest Biggest Cost: Highest Lowest Power: Highest Lowest
CPUL2
CacheL3
CacheMemory(DRAM)
Taken from: Computer Structure 2014 slides, by Lihu Rappoport and Adi Yoaz (modified)
23
Processor 1
L1 cache
Processor 2
L1 cache
L2 cache (shared)
Memory
Taken from: Computer Structure 2014 slides, by Lihu Rappoport and Adi Yoaz
24
SMP (symmetric multiprocessing)
NUMA (Non-uniform memory access)
Taken from: The art of multiprocessor programming, by Maurice Herlihy and Nir Shavit, 2008 (modified)
not scalable
25
Cache Coherence
Cache-line states:• Modified• Exclusive• Shared• Invalid
Taken from: The art of multiprocessor programming, by Maurice Herlihy and Nir Shavit, 2008 (modified)
false sharing
26
Spinning
SMPNUMA
Taken from: The art of multiprocessor programming, by Maurice Herlihy and Nir Shavit, 2008 (modified)
27
• Execute instructions out-of-order/in parallel/speculatively. • write buffer
• reordering of reads-writes by compiler• memory barrier instruction (expensive)
• reads-writes reorder in Java• Volatile variables in Java
Multi-Core and Multi-Threaded Architectures
Taken from: The art of multiprocessor programming, by Maurice Herlihy and Nir Shavit, 2008 (modified)
28
Hardware Synchronization Instructions
• compare-and-swap/set (CAS)• load-linked & store-conditional (LL/SC)
29
Thanks!