Top Banner
Lock-Free Programming: Pro Tips Jean-Philippe BEMPEL @jpbempel Performance Architect http://jpbempel.blogspot.com © ULLINK 2015
83

Lock free programming- pro tips

Jul 17, 2015

Download

Software

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lock free programming- pro tips

Lock-Free Programming: Pro Tips

Jean-Philippe BEMPEL @jpbempelPerformance Architect http://jpbempel.blogspot.com

© ULLINK 2015

Page 2: Lock free programming- pro tips

Agenda

© ULLINK 2015 2

• Measuring Contention

• Lock Striping

• Compare-And-Swap

• Introduction to Java Memory Model

• Disruptor & RingBuffer

• Spinning

• Ticketing: OrderedScheduler

Page 3: Lock free programming- pro tips

© ULLINK 2015 3

Immutability

Page 4: Lock free programming- pro tips

© ULLINK 2015 4

Contention

Page 5: Lock free programming- pro tips

Contention

© ULLINK 2015 5

• Two or more thread competingto acquire a lock

• Thread parked when waiting fora lock

• Number one reason we want toavoid lock

Page 6: Lock free programming- pro tips

© ULLINK 2015 6

Measure, don’t guess! Kirk Pepperdine & Jack Shirazi

Page 7: Lock free programming- pro tips

© ULLINK 2015 7

Measure, don’t premature!

Page 8: Lock free programming- pro tips

Measuring Contention

© ULLINK 2015 8

Synchronized blocks:

• Profilers (YourKit, JProfiler, ZVision)

• JVMTI native agent

• Results may be difficult to exploit

Page 9: Lock free programming- pro tips

Measuring Contention: JProfiler

© ULLINK 2015 9

Page 10: Lock free programming- pro tips

Measuring Contention: ZVision

December 16, 2014© ULLINK 2014 – Private & Confidential 10

Page 11: Lock free programming- pro tips

Measuring Contention

© ULLINK 2015 11

java.util.concurrent.Lock:

• JVM cannot helps us here

• JDK classes (lib), regular code

• JProfiler can measure them

• j.u.c classes modification + bootclasspath (jucprofiler)

Page 12: Lock free programming- pro tips

Measuring Contention: JProfiler

© ULLINK 2015 12

Page 13: Lock free programming- pro tips

Measuring Contention

© ULLINK 2015 13

• Insertion of contention counters• Identify place where lock fail to be acquired• increment counter

• Identify Locks• Call stacks at construction• Logging counter status

• How to measure existing locks in your code• Modify JDK classes• Reintroduce in bootclasspath

Page 14: Lock free programming- pro tips

© ULLINK 2015 14

Lock striping

Page 15: Lock free programming- pro tips

Lock striping

© ULLINK 2015 15

• Reduce contention by distributing it

• Not remove locks, instead adding more

• Good partitioning is key to be effective (like HashMap)

Page 16: Lock free programming- pro tips

Lock striping

© ULLINK 2015 16

Best example in JDK: ConcurrentHashMap

Page 17: Lock free programming- pro tips

Lock striping

© ULLINK 2015 17

• Relatively easy to implement

• Can be very effective as long as good partitioning

• Can be tuned (number of partition) regarding the contention/concurrency level

Page 18: Lock free programming- pro tips

© ULLINK 2015 18

Compare-And-Swap

Page 19: Lock free programming- pro tips

Compare-And-Swap

© ULLINK 2015 19

• Basic primitive for any lock-free algorithm

• Used to implement any locks or synchronization primitives

• Handled directly by the CPU (instructions)

Page 20: Lock free programming- pro tips

Compare-And-Swap

© ULLINK 2015 20

• Update atomically a memory location by another value if the previous value is the expected one

• instruction with 3 arguments:• memory address (rbx)• expected value (rax)• new value (rcx)

movabs rax,0x2amovabs rcx,0x2block cmpxchg QWORD PTR [rbx],rcx

Page 21: Lock free programming- pro tips

Compare-And-Swap

© ULLINK 2015 21

Page 22: Lock free programming- pro tips

• In Java for AtomicXXX classes:boolean compareAndSet(long expect, long update)

• Memory address is the internal value field of the class

Compare-And-Swap

© ULLINK 2015 22

Page 23: Lock free programming- pro tips

• Atomic increment with CAS[JDK7] getAndIncrement():

while (true) { long current = get(); long next = current + 1; if (compareAndSet(current, next)) return current; }

[JDK8] getAndIncrement():return unsafe.getAndAddLong(this, valueOffset, 1L);

intrinsified to:movabs rsi,0x1

lock xadd QWORD PTR [rdx+0x10],rsi

Compare-And-Swap: AtomicLong

© ULLINK 2015 23

Page 24: Lock free programming- pro tips

ReentrantLock is implemented with a CAS: volatile int state;

lock() compareAndSet(0, 1);

if CAS fails => lock already acquired

unlock() setState(0)

Compare-And-Swap: Lock implementation

© ULLINK 2015 24

Page 25: Lock free programming- pro tips

• Simplest lock-free algorithm

• Use CAS to update the next pointer into a linked list

• if CAS fails, means concurrent update happened

• Read new value, go to next item and retry CAS

Compare-And-Swap: ConcurrentLinkedQueue

© ULLINK 2015 25

Page 26: Lock free programming- pro tips

Compare-And-Swap: ConcurrentLinkedQueue

© ULLINK 2015 26

Page 27: Lock free programming- pro tips

Compare-And-Swap: ConcurrentLinkedQueue

© ULLINK 2015 27

Page 28: Lock free programming- pro tips

Compare-And-Swap: ConcurrentLinkedQueue

© ULLINK 2015 28

Page 29: Lock free programming- pro tips

Compare-And-Swap: ConcurrentLinkedQueue

© ULLINK 2015 29

Page 30: Lock free programming- pro tips

© ULLINK 2015 30

Java Memory Model(introduction)

Page 31: Lock free programming- pro tips

• First language having a well defined memory model: Java JDK 5 (2004) with JSR 133

• C++ get a standard Memory Model in 2011 (C++11)

• Before that, some constructions may have undefined/different behavior on different platform(Double Check Locking)

Memory Model

© ULLINK 2015 31

Page 32: Lock free programming- pro tips

int a;int b;boolean enabled;

{ {a = 21; enabled = true;b = a * 2; a = 21;

enabled = true; b = a * 2;} }

Memory ordering

© ULLINK 2015 32

JIT Compiler

Page 33: Lock free programming- pro tips

int a;int b;boolean enabled;

Thread 1 Thread 2{ {

a = 21; if (enabled)b = a * 2; {

enabled = true; int answer = b} process(answer);

}}

Memory ordering

© ULLINK 2015 33

Page 34: Lock free programming- pro tips

int a;int b;volatile boolean enabled;

Thread 1 Thread 2{ {

a = 21; if (enabled)b = a * 2; {

enabled = true; int answer = b} process(answer);

}}

Memory ordering

© ULLINK 2015 34

Page 35: Lock free programming- pro tips

Memory barriers

© ULLINK 2015 35

• Can be at 2 levels: Compiler & Hardware

• Depending on CPU architecture, barrier is not required

• on x86: Strong model, limited reordering

Page 36: Lock free programming- pro tips

Memory barriers

© ULLINK 2015 36

Page 37: Lock free programming- pro tips

Memory barriers: volatile

© ULLINK 2015 37

• volatile field implies memory barrier

• Compiler barrier: prevent reordering

• Hardware barrier: Ensure drain of the memory buffers

• on X86, only store barrier emits an hardware one

lock add DWORD PTR [rsp],0x0

Page 38: Lock free programming- pro tips

Memory barriers: CAS

© ULLINK 2015 38

• CAS is also a memory barrier

• Compiler: recognized by JIT to prevent reordering

• Hardware: all lock instructions is a memory barrier

Page 39: Lock free programming- pro tips

Memory barriers: synchronized

© ULLINK 2015 39

• Synchronized blocks have implicit memory barriers

• Entering block: Load memory barrier

• Exiting block: store memory barrier

Page 40: Lock free programming- pro tips

Memory barriers: synchronized

© ULLINK 2015 40

synchronized (this)

{

enabled = true;

b = 21;

a = b * 2;

}

Page 41: Lock free programming- pro tips

Memory barriers: lazySet

© ULLINK 2015 41

• method from AtomicXXX classes

• Compiler only memory barrier

• Does not emit hardware store barrier • Still guarantee non reordering (most important)

but not immediate effect for other thread

Page 42: Lock free programming- pro tips

© ULLINK 2015 42

Disruptor & Ring Buffer

Page 43: Lock free programming- pro tips

Disruptor

© ULLINK 2015 43

• LMAX library (incl. Martin Thompson)

• Not a new idea, circular buffers in Linux Kernel, Lamport

• Ported to Java

Page 44: Lock free programming- pro tips

Disruptor

© ULLINK 2015 44

Why not used CLQ which is lock(wait)-free?

• Queue unbounded et non blocking• Allocate a node at each insertion• Not CPU cache friendly• MultiProducer and MultiConsumer

Array/LinkedBlockingQueue: Not lock-free

Page 45: Lock free programming- pro tips

Ring Buffer: 1P 1C

© ULLINK 2015 45

Page 46: Lock free programming- pro tips

Ring Buffer: 1P 1C

© ULLINK 2015 46

Page 47: Lock free programming- pro tips

Ring Buffer: 1P 1C

© ULLINK 2015 47

Page 48: Lock free programming- pro tips

Object[] ringBuffer;

volatile int head;volatile int tail;

public boolean offer(E e) { if (tail - head == ringBuffer.length) return false; ringBuffer[tail % ringBuffer.length] = e;

tail++; // volatile write

return true;}

Ring Buffer: 1P 1C

© ULLINK 2015 48

Page 49: Lock free programming- pro tips

public E poll() { if (tail == head) return null; int idx = head % ringBuffer.length E element = ringBuffer[idx];

ringBuffer[idx] = null; head++; // volatile write

return element;}

Ring Buffer: 1P 1C

© ULLINK 2015 49

Page 50: Lock free programming- pro tips

Ring Buffer: nP 1C

© ULLINK 2015 50

Page 51: Lock free programming- pro tips

Ring Buffer: nP 1C

© ULLINK 2015 51

Page 52: Lock free programming- pro tips

Ring Buffer: nP 1C

© ULLINK 2015 52

Page 53: Lock free programming- pro tips

Ring Buffer: nP 1C

© ULLINK 2015 53

Page 54: Lock free programming- pro tips

AtomicReferenceArray ringBuffer;

volatile long head;AtomicLong tail;

Ring Buffer: nP 1C

© ULLINK 2015 54

Page 55: Lock free programming- pro tips

public boolean offer(E e) {

long curTail;

do {

curTail = tail.get();

if (curTail - head == ringBuffer.length())

return false;

} while (!tail.compareAndSet(curTail, curTail+1));

int idx = curTail % ringBuffer.length();

ringBuffer.set(idx, e); // volatile write

return true;

}

Ring Buffer: nP 1C

© ULLINK 2015 55

Page 56: Lock free programming- pro tips

public E poll() {

int index = head % ringBuffer.length();

E element = ringBuffer.get(index);

if (element == null)

return null;

ringBuffer.set(index, null);

head++; // volatile write

return element;

}

Ring Buffer: nP 1C

© ULLINK 2015 56

Page 57: Lock free programming- pro tips

Disruptor

© ULLINK 2015 57

• Very flexible for different usages (strategies)

• Very good performance

• Data transfer from one thread to another (Queue)

Page 58: Lock free programming- pro tips

© ULLINK 2015 58

Spinning

Page 59: Lock free programming- pro tips

spinning

© ULLINK 2015 59

• Active wait

• very good for consumer reactivity

• Burns a cpu permanently

Page 60: Lock free programming- pro tips

spinning

© ULLINK 2015 60

• Some locks are implemented with spinning (spinLock)

• Synchronized blocks spin a little bit on contention

• use of the pause instruction (x86)

Page 61: Lock free programming- pro tips

spinning

© ULLINK 2015 61

How to avoid burning a core ?

Backoff strategies:• Thread.yield()• LockSupport.parkNanos(1)• Object.wait()/Condition.await()/LockSupport.park()

Page 62: Lock free programming- pro tips

© ULLINK 2015 62

Ticketing:OrderedScheduler

Page 63: Lock free programming- pro tips

Ticketing

© ULLINK 2015 63

How to parallelize tasks while keeping ordering ?

Example: Video stream processing

• read frame from the stream• processing of the frame (parallelisable)• writing into the output (in order)

Page 64: Lock free programming- pro tips

Ticketing

© ULLINK 2015 64

Page 65: Lock free programming- pro tips

Ticketing

© ULLINK 2015 65

Page 66: Lock free programming- pro tips

Ticketing

© ULLINK 2015 66

Can do this with Disruptor, but with a consumer thread

OrderedScheduler can do the same but:• no inter-thread communication overhead• no additional thread• no wait strategy

Take a ticket...

Page 67: Lock free programming- pro tips

OrderedScheduler

© ULLINK 2015 67

Page 68: Lock free programming- pro tips

OrderedScheduler

© ULLINK 2015 68

Page 69: Lock free programming- pro tips

OrderedScheduler

© ULLINK 2015 69

Page 70: Lock free programming- pro tips

OrderedScheduler

© ULLINK 2015 70

Page 71: Lock free programming- pro tips

OrderedScheduler

© ULLINK 2015 71

Page 72: Lock free programming- pro tips

OrderedScheduler

© ULLINK 2015 72

Page 73: Lock free programming- pro tips

OrderedScheduler

© ULLINK 2015 73

Page 74: Lock free programming- pro tips

OrderedScheduler

© ULLINK 2015 74

public void execute() { synchronized (this) { FooInput input = read();

BarOutput output = process(input);

write(output);

}

}

Page 75: Lock free programming- pro tips

OrderedScheduler

© ULLINK 2015 75

OrderedScheduler scheduler = new OrderedScheduler();

public void execute() {

FooInput input;

long ticket;

synchronized (this) {

input = read();

ticket = scheduler.getNextTicket();

}

[...]

Page 76: Lock free programming- pro tips

OrderedScheduler

© ULLINK 2015 76

public void execute(){

[...]

BarOutput output;

try {

output = process(intput);

}

catch (Exception ex) {

scheduler.trash(ticket);

throw new RuntimeException(ex);

}

scheduler.run(ticket, { () => write(output); });

}

Page 77: Lock free programming- pro tips

Ticketing

© ULLINK 2015 77

• Open sourced on GitHub

• Opened to PR & discussion on the design

• Used internally

Page 78: Lock free programming- pro tips

© ULLINK 2015 78

Takeaways

Page 79: Lock free programming- pro tips

Takeaways

© ULLINK 2015 79

• Measure

• Distribute

• Atomically update

• Order

• RingBuffer: easy lock-free

• Ordered without consumer thread

Page 80: Lock free programming- pro tips

References

© ULLINK 2015 80

• jucProfiler: http://www.infoq.com/articles/jucprofiler• Java Memory Model Pragmatics: http://shipilev.net/blog/2014/jmm-

pragmatics/• Memory Barriers and JVM Concurrency: http://www.infoq.

com/articles/memory_barriers_jvm_concurrency• JSR 133 (FAQ): http://www.cs.umd.

edu/~pugh/java/memoryModel/jsr-133-faq.html• CPU cache flushing fallacy: http://mechanical-sympathy.blogspot.

fr/2013/02/cpu-cache-flushing-fallacy.html• atomic<> Weapons: http://channel9.msdn.

com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-atomic-Weapons-1-of-2

Page 81: Lock free programming- pro tips

References

© ULLINK 2015 81

• circular buffers: http://lwn.net/Articles/378262/• Mr T queues: https://github.

com/mjpt777/examples/tree/master/src/java/uk/co/real_logic/queues

• Lock-free algorithms by Mr T: http://www.infoq.com/presentations/Lock-free-Algorithms

• Futex are tricky U. Drepper: http://www.akkadia.org/drepper/futex.pdf

• JCTools: https://github.com/JCTools/JCTools• Nitsan Wakart blog: http://psy-lob-saw.blogspot.com

Page 83: Lock free programming- pro tips

ABOUT ULLINK

ULLINK’s electronic financial trading software solutions and services address the challenges of new regulations and increased fragmentation. Emerging markets require fast, flexible and compliant solutions for buy-side and sell-side market participants; providing a competitive advantage for both low touch and high touch trading.

www.ullink.com

FIND OUT MORE

Contact our Sales Team to learn more about the services listed herein as well as our full array of offerings:

+81 3 3664 4160 (Japan)

+852 2521 5400 (Asia Pacific)

+1 212 991 0816 (North America)

+55 11 3171 2409 (Latin America)

+44 20 7488 1655 (UK and EMEA)

[email protected]

© ULLINK 2015 83