Top Banner
Garbage, Garbage Everywhere GC Strategies for Event Processing Systems on the JVM C. Scott Andreas Pizza, Beer, and Tech Talks November 17, 2011
37

Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

Jul 02, 2015

Download

Technology

boundary_slides

This presentation from the November 17, 2011 Boundary Meetup takes us through the architecture of Boundary's stream processing infrastructure and how the architecture is pushing the bounds of JVM throughput.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

Garbage, Garbage EverywhereGC Strategies for Event Processing Systems on the JVM

C. Scott AndreasPizza, Beer, and Tech TalksNovember 17, 2011

Page 2: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

What’s ESP / CEP?

• Event Stream ProcessingSelecting events on dimensions among a stream of movingdata, maintaining them for a brief period, emitting aggregations.

• Complex Event ProcessingIdentifying correlations between events, predicting trends,and programmatically reacting to emergent trends.

Page 3: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

ESP and Network Analytics

• Packet flows are event streams with many dimensions.

• Blast them into the engine, select over the stream, emit aggregations based on queries.

• Ipfix data flows in, JSON comes out.

Page 4: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

Back of the Envelope

• 500 Mbps / sec data comes into the JVMxxx Mbps / sec data goes out of the JVM

• This memory must be allocated, retainedfor processing, freed, and collected.

• Actual allocation rates far higher than data in / out(Memory also used for deserializing, aggregations, etc).

Page 5: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11
Page 6: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11
Page 8: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

Moment of Pause

• Don’t touch the knobs unless you need to

• Server defaults are a decent place to start for local development

• Defaults shipped with Cassandra decent for bimodal GC profiles

• Basic rule of thumb: if you’re aggressively tuning garbage collection, you can trade hours of frustration for ~10% gain

Page 9: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

Generational Garbage Collection

• Modern JVMs divide heap space up into multiple “generations.”

• Most applications have a lot of objects which live for a very short time, and a lot which live (nearly) forever.

• Generational collection enables the JVM to collect unused memory more efficiently by avoiding unnecessarily scanning heap / object graphs for references or free regions.

Page 10: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

first attempt: “deploy the g1”

Page 12: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11
Page 13: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

G1 Collector

• Hundreds of tiny 1ms collections / second rather thanParNew’s ~100 - 200ms larger collections.

• Capable of meeting ambitious pause targets.

• Powered by a gang of threads working in parallel

• ...cooperating to chew through CPU like it’s free.

Page 14: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

second attempt:you’re gonna laugh

Page 15: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

the unsafe

Page 16: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

Unsafe

• A OpenJDK/HotSpot class exposing direct access to the underlying VM, OS, and memory.

• This includes the ability to allocate, manage, and free memory.

• Perhaps we can outsmart the JVM and do a better job than it!

Page 17: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11
Page 18: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11
Page 19: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11
Page 20: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

Learned While Astray

• Finalization occurs in a single thread.

• Jumping from native finalization back into Java is expensive.

• Attempting to outsmart the garbage collector by creating hundreds of thousands of tiny ByteBuffers is...a thing.

• Java’s collectors are very good at collecting garbage.Your home-grown in-app GC go-kart is probably not.

Page 21: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

returning to earth ::attempt 3[a]

Page 22: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

Lessons from Science

• Your rate of “freeing” must be equal to or exceed your rate of object allocation on the heap.

• High rates of allocation speed up heap fragmentation,which compounds the problem.

• Creating less garbage reduces your rate of allocation(and freeing).

• This means less work for the garbage collector.

Page 23: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

best way to help out the gc ::

Page 24: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

PRODUCE LESS GARBAGE

Page 25: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

Breaking out YourKit

Page 26: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

attempt 3[b]:responsible tuning of the old hat

Page 27: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

Optimizing for Infant Mortality

• Java 6 AMD64 (server) defaults to allocating1/3 of heap to the new gen, 2/3 to the old gen.

• ESP/CEP workloads place tremendous pressure on the newgen. The vast majority of objects survive less than five seconds.

• Experiment: Allocate 80% of heap to the new gen, set a higher tenuring threshold, and lean hard on the ParNew collector.

default newgen ratios in java 6

Page 28: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

CMS Collector

• Guardian of the tenured generation, favorite workhorse for years.

• Primarily parallel, easier on the CPU than the G1.

• ...But contains a significant pause phase, is less suited to meeting low pause targets.

Page 29: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

ParNew Collector

• Designed for the small, but works great in the large.Excellent throughput, parallel collection.

• Can collect ~5GB in ~200ms on a quad-core Xeon w/HT.

• 200ms pause every several seconds favorable compared toless frequent multi-second pauses and promotion failures.

Page 30: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

Explosions in the Barrel

Page 31: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11
Page 32: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11
Page 33: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

“real-time” and the jvm

Page 34: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

Real Time and the JVM

• Real TimeAbility to meet specific targets with low variance is critical to the bare minimum functionality of the product (e.g., air bags).

• “Soft” Real TimeAbility to meet targets important but not critical. Value of system’s functionality is diminished but not eliminated by delay.

Page 35: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

Real Time and “The Pause”

• To what extent can a system which can endure pauses of unpredictable duration be considered “real-time”?

• Is it sufficient to mitigate the frequency and duration of pauses for a system to still deliver value as “soft real-time”?

• Is the alternative worth the cost?

Page 36: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

what does your app sound like?

Page 37: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

Garbage, Garbage EverywhereGC Strategies for Event Processing Systems on the JVM

C. Scott AndreasPizza, Beer, and Tech TalksNovember 17, 2011