Top Banner
Java Under the hood
98
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Java under the hood

JavaUnder the hood

Page 2: Java under the hood

Javac and JVM optimizations

Page 3: Java under the hood
Page 4: Java under the hood

Agenda● Javac and JVM optimizations

○ JIT (Just In Time Compilation) ■ Profiling, Method Binding, Safepoints

○ Method Inlining, ○ Loop Unrolling, ○ Lock Coarsening○ Lock Eliding, ○ Branch Prediction, ○ Escape Analysis○ OSR (On Stack Replacement)○ TLAB (Thread Local Allocation Buffers)

Page 5: Java under the hood

Java programm lifetime

Page 6: Java under the hood

JIT Compilation

Page 7: Java under the hood

Method Inlining

Page 8: Java under the hood

Loop Unrolling

Page 9: Java under the hood

Loop Unrolling

Page 10: Java under the hood

Lock Coarsening

Page 11: Java under the hood

Lock Eliding

Page 12: Java under the hood

Branch Prediction

Page 13: Java under the hood

Branch Prediction

Page 14: Java under the hood

Branch Prediction

● Performance of an if-statement depends on whether its condition has a predictable pattern.

● A “bad” true-false pattern can make an if-statement up to six times slower than a “good” pattern!

Page 15: Java under the hood

Doing string concatenation in one scope will be picked by javac and replaced with StringBuilder equivalent.

String concatenation example

Page 16: Java under the hood

String concatenation example

Page 17: Java under the hood

Intrinsics

Intrinsics are methods KNOWN to JIT. Bytecodes of those are ignored and native most performant versions for target platform is used...

● System::arraycopy● String::equals● Math::*● Object::hashcode● Object::getClass● Unsafe::*

Page 18: Java under the hood

Escape Analysis

Any object that is not escaping its creation scope MAY be optimized to stack allocation.

Mostly Lambdas, Anonymous classes, DateTime, String Builders, Optionals etc...

Page 19: Java under the hood

Escape analysis

Page 20: Java under the hood

TLAB (Thread Local Allocation Buffers)

Page 21: Java under the hood

How to “see” JIT activity? - JitWatch

Page 22: Java under the hood

Conclusion

Before attempting to “optimize” something in low level, make sure you understand what the environment is already optimizing for you…

Dont try to predict the performance (especially low-level behavior) of your program by looking at the bytecode. When the JIT Compiler is done with it, there will not be much similarities left.

Page 23: Java under the hood

Questions?

Page 24: Java under the hood

Concurrency : Level 0

Page 25: Java under the hood

Agenda● Concurrency : Hardware level

○ CPU architecture evolution○ Cache Coherency Protocols○ Memory Barriers○ Store Buffers○ Cachelines○ volatiles, monitors (locks, synchronization), atomics

Page 26: Java under the hood

CPU structure

Page 27: Java under the hood

Cache access latencies

CPUs are getting faster not by frequency but by lower latency between L caches, better cache coherency protocols and smart optimizations.

Page 28: Java under the hood

Why Concurrency is HARD?Problem 1 : VISIBILITY!

● Any processor can temporarily store some values to L caches instead of Main memory, thus other processor might not see changes made by first processor…

● Also if processor works for some time with L caches it might not see changes made by other processor right away...

Page 29: Java under the hood

Why Concurrency is HARD?Problem 2 : Reordering

Page 30: Java under the hood

Example : Non thread safe

Page 31: Java under the hood

JMM (Java Memory Model)Java Memory model is set of rules and guidelines which allows Java programs to behave deterministically across multiple memory architecture, CPU, and operating systems.

Page 32: Java under the hood

Thread safe version (visibility + reordering both solved)

Page 33: Java under the hood

Thread safe version

Page 34: Java under the hood

cpu/x86/vm/c1\_LIRGenerator\_x86.cpp

Page 35: Java under the hood

Example : Thread safe

Page 36: Java under the hood

Happens Before

Page 37: Java under the hood

Understanding volatile

Page 38: Java under the hood

Conclusions on Volatile

● Volatile guarantees that changes made by one thread is visible to other thread.

● Guarantees that read/write to volatile field is never reordered (instructions before and after can be reordered).

● Volatile without additional synchronization is enough if you have only one writer to the volatile field, if there are more than one you need to synchronize...

Page 39: Java under the hood

Volatile Write/Read performance

Page 40: Java under the hood

Lazy Singleton (not thread safe)

Page 41: Java under the hood

Lazy Singleton (dumb thread safety)

Page 42: Java under the hood

Lazy Singleton (not thread safe)

Page 43: Java under the hood

Lazy Singleton (still not thread safe)

Page 44: Java under the hood

Lazy Singleton (thread safe yay!)

Page 45: Java under the hood

Happens Before

Page 46: Java under the hood

Lazy Singleton (CL trick)

Page 47: Java under the hood

False sharing (hidden contention)

Page 48: Java under the hood

False Sharing

Page 49: Java under the hood

False Sharing

Page 50: Java under the hood

Monitors

Monitor Operations :● monitorenter● monitorexit● wait● notify/notifyAll

Monitor States :● init● biased● thin● fat (inflated)

Page 51: Java under the hood

Cost of Contention

Page 52: Java under the hood

Conclusion

● Volatile reads are not that bad● Avoid sharing state● Avoid writing to shared state● Avoid Contention

Page 54: Java under the hood

JMH example

Page 55: Java under the hood

JMH example

Page 56: Java under the hood

Jcstress example

Page 57: Java under the hood

Jcstress sample output

Page 58: Java under the hood

IMPORTANT!

Sometimes horizontal scaling is cheaper. Developing hardware friendly code is hard, it breaks easy if new developer does not understand existing code base or new version of JVM does some optimizations you never expect (happens a lot), it's hard to test, If your product needs higher throughput, you either make it more efficient or scale. When cost of scaling is too high then it makes perfect sense to make the system more efficient (assuming you don't have fundamentally inefficient system).

If you’re scaling your product and a single node on highest load utilizes low percentage of its resources (CPU, Memory etc…) then you have a not efficient system.

Developing hardware friendly code is all about efficiency, on most systems you might NEVER need to go low level, but knowledge of low level semantics of your environment will enable you to write more efficient code by default.

And most important NEVER EVER optimize without BENCHMARKING!!!

Page 59: Java under the hood

Disruptor by LMAX

Page 60: Java under the hood

Example of Disrupter useage : Log4j2

In the test with 64 threads, asynchronous loggers are 12 times faster than asynchronous appenders, and 68 times faster than synchronous loggers.

Page 61: Java under the hood

Why?

● Generally any traditional queue is in one of two states : either its filling up, or it’s draining.

● Most queues are unbounded : and any unbounded queue is a potential OOM source.

● Queues are writing to the memory : put and pull… and writes are expensive. During a write queue is locked (or partially locked).

● Queues are best way to create CONTENTION! thats what often is the bottleneck of the system.

Page 62: Java under the hood

Queue typical state

Page 63: Java under the hood

What is it all about Disruptor?

● Non blocking. A write does not lock consumers, and consumers work in parallel, with controlled access to data in the queue, and without CONTENTION!

● GC Free : Disruptor does not create any objects at all, instead it pre allocates all the memory programmatically predefined for it.

● Disruptor is bounded.

● Cache friendly. (Mechanical sympathy)

● Its hardware friendly. Disruptor uses all the low level semantics of JMM to achieve maximum performance/latency.

● One thread per consumer.

Page 64: Java under the hood

Theory : understanding disruptor

Page 65: Java under the hood
Page 66: Java under the hood
Page 67: Java under the hood
Page 68: Java under the hood
Page 69: Java under the hood
Page 70: Java under the hood
Page 71: Java under the hood

Writing to Ring Buffer

Page 72: Java under the hood
Page 73: Java under the hood
Page 74: Java under the hood
Page 75: Java under the hood
Page 76: Java under the hood
Page 77: Java under the hood
Page 78: Java under the hood
Page 79: Java under the hood
Page 80: Java under the hood
Page 81: Java under the hood

Reading from Ring Buffer

Page 82: Java under the hood
Page 83: Java under the hood

Disruptor can coordinate consumers

Page 84: Java under the hood

Lmax architecture

Page 85: Java under the hood

Disruptor (Pros)● Performance of course● Holly BATCHING!!!● Mechanical Sympathy● Optionally GC Free● Prevents False Sharing● Easy to compose dependant consumers (concurrency)● Synchronization free code in consumers● Data Structure (not a frickin framework!!!)● Fits werry well with CQRS and ES

Page 86: Java under the hood

Disruptor (Pros)

● Thread affinity (for more performance/throughput) ● Different strategies for Consumers (busy spin, sleep)● Single/Multiple producer strategy

Page 87: Java under the hood

Avoid useless processing (disrupter can batch)

Page 88: Java under the hood

Disruptor (Cons)

● Not as trivial as ABQ (or other queues)● Reasonable limit for busy threads (consumers)● Not a drop in replacement, it different approach to queues

Page 89: Java under the hood

Disruptor Implementation (simplified : single writer)

Page 90: Java under the hood

No locks at all ( Atomic.lazySet )

Page 91: Java under the hood

Why power of 2?

Page 92: Java under the hood

Ring Buffer customizations

● Producer strategies○ Single producer○ Multiple producer

● Wait Strategies○ Sleeping Wait○ Yielding Wait○ Busy Spin

Page 93: Java under the hood
Page 94: Java under the hood
Page 96: Java under the hood

And some stuff about high performance Java code

● https://www.youtube.com/watch?v=NEG8tMn36VQ● https://www.youtube.com/watch?v=t49bfPLp0B0● http://www.slideshare.net/PeterLawrey/writing-and-testing-high-frequency-trading-engines-in-java● https://www.youtube.com/watch?v=ih-IZHpxFkY

Page 98: Java under the hood

Coming next

Concurrency : Level 1Concurrency primitives provided by language SDK. Everything that provides manual control over concurrency.

- package java.util.concurrent.*- Future- CompletableFuture- Phaser- ForkJoinPool (in Java 8), ForkJoinTask, CountedCompleters

Concurrency : Level 2High level approach to concurrency, when library or framework handles concurrent execution of the code... (will cover only RxJava although there is a bunch of other good stuff)

- Functional Programming approach (high order functions)- Optional- Streams- Reactive Programming (RxJava)