Top Banner
Senior Software Engineer LIVING WITH GARBAGE! Gregg Donovan etsy.com
93

Living With Garbage

Jun 19, 2015

Download

Technology

Gregg Donovan

"Living With Garbage" talk by Gregg Donovan at the NYC Search and Discovery Meetup on 12/12/2013.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Living With Garbage

Senior Software Engineer

LIVING WITH GARBAGE!

Gregg Donovan

etsy.com

Page 2: Living With Garbage

4 Years Solr & Lucene at etsy.com

3 years Solr & Lucene at TheLadders.com

Page 3: Living With Garbage
Page 4: Living With Garbage

10+ million members

Page 5: Living With Garbage

24+ million items

Page 6: Living With Garbage

1mm+ active sellers

Page 7: Living With Garbage

10+ billion pageviews per month

Page 8: Living With Garbage
Page 9: Living With Garbage
Page 10: Living With Garbage
Page 11: Living With Garbage
Page 12: Living With Garbage
Page 13: Living With Garbage
Page 14: Living With Garbage
Page 15: Living With Garbage

CodeAsCraft.etsy.com

Page 16: Living With Garbage

Understanding GCMonitoring GC

Debugging Memory LeaksDesign for Partial Availability

Page 17: Living With Garbage
Page 18: Living With Garbage

public class BuzzwordDetector { static String[] prefixes = { "synergy", "win-win" }; static String[] myArgs = { "clown synergy", "gorilla win-wins", "whamee" }; ! public static void main(String[] args) { args = myArgs; ! int buzzwords = 0; for (int i = 0; i < args.length; i++) { String lc = args[i].toLowerCase(); for (int j = 0; j < prefixes.length; j++) { if (lc.contains(prefixes[j])) { buzzwords++; } } } System.out.println("Found " + buzzwords + " buzzwords"); } }

Page 19: Living With Garbage

New(): ref <- allocate() if ref = null /* Heap is full */ collect() ref <- allocate() if ref = null /* Heap is still full */ error "Out of memory" return ref atomic collect(): markFromRoots() sweep(HeapStart, HeapEnd)

From Garbage Collection Handbook

Page 20: Living With Garbage

markFromRoots(): initialise(worklist) for each fld in Roots ref <- *fld if ref != null && not isMarked(ref) setMarked(ref) add(worklist, ref) mark() initialise(worklist): worklist <- empty mark(): while not isEmpty(worklist) ref <- remove(worklist) /* ref is marked */ for each fld in Pointers(ref) child <- *fld if (child != null && not isMarked(child) setMarked(child) add(worklist, child)

From Garbage Collection Handbook

Page 21: Living With Garbage

Trivia: Who invented the first GC and Mark-and-Sweep?

Page 22: Living With Garbage

Weak Generational Hypothesis

Page 23: Living With Garbage

Where do objects in common Solr application live?

AtomicReaderContext?

SolrIndexSearcher?

SolrRequest?

Page 24: Living With Garbage

GC Terminology: Concurrent vs Parallel

Page 25: Living With Garbage

JVM Collectors

Page 26: Living With Garbage

Serial

Page 27: Living With Garbage

Trivia: How does System.identityHashCode() work?

Page 28: Living With Garbage

Throughput

Page 29: Living With Garbage

CMS

Page 30: Living With Garbage

Garbage First (G1)

Page 31: Living With Garbage

Continuously Concurrent Compacting Collector (C4)

Page 32: Living With Garbage

IBM, Dalvik, etc.?

Page 33: Living With Garbage

Why Throughput?

Page 34: Living With Garbage

Questions so far?

Page 35: Living With Garbage

Monitoring

Page 36: Living With Garbage

GC time per Solr request

Page 37: Living With Garbage

... import java.lang.management.*; ... ! public static long getCollectionTime() { long collectionTime = 0; for (GarbageCollectorMXBean mbean : ManagementFactory.getGarbageCollectorMXBeans()) { collectionTime += mbean.getCollectionTime(); } return collectionTime; }

Available via JMX

Page 38: Living With Garbage

Visual GC

Page 39: Living With Garbage
Page 40: Living With Garbage

export GC_DEBUG="-verbose:gc \ -XX:+PrintGCDateStamps \ -XX:+PrintHeapAtGC \ -XX:+PrintGCApplicationStoppedTime \ -XX:+PrintGCApplicationConcurrentTime \ -XX:+PrintAdaptiveSizePolicy \ -XX:AdaptiveSizePolicyOutputInterval=1 \ -XX:+PrintTenuringDistribution \ -XX:+PrintGCDetails \ -XX:+PrintCommandLineFlags \ -XX:+PrintSafepointStatistics \ -Xloggc:/var/log/search/gc.log"

Page 41: Living With Garbage

2013-04-08T20:14:00.162+0000: 4197.791: [Full GCAdaptiveSizeStart: 4206.559 collection: 213 PSAdaptiveSizePolicy::compute_generation_free_space limits: desired_promo_size: 9927789154 promo_limit: 8321564672 free_in_old_gen: 4096 max_old_gen_size: 22190686208 avg_old_live: 22190682112 AdaptiveSizePolicy::compute_generation_free_space limits: desired_eden_size: 9712028790 old_eden_size: 8321564672 eden_limit: 8321564672 cur_eden: 8321564672 max_eden_size: 8321564672 avg_young_live: 7340911616 AdaptiveSizePolicy::compute_generation_free_space: gc time limit gc_cost: 1.000000 GCTimeLimit: 98 PSAdaptiveSizePolicy::compute_generation_free_space: costs minor_time: 0.167092 major_cost: 0.965075 mutator_cost: 0.000000 throughput_goal: 0.990000 live_space: 29859940352 free_space: 16643129344 old_promo_size: 8321564672 old_eden_size: 8321564672 desired_promo_size: 8321564672 desired_eden_size: 8321564672 AdaptiveSizeStop: collection: 213 [PSYoungGen: 8126528K->7599356K(9480896K)] [ParOldGen: 21670588K->21670588K(21670592K)] 29797116K->29269944K(31151488K) [PSPermGen: 58516K->58512K(65536K)], 8.7690670 secs] [Times: user=137.36 sys=0.03, real=8.77 secs] Heap after GC invocations=213 (full 210): PSYoungGen total 9480896K, used 7599356K [0x00007fee47ab0000, 0x00007ff0dd000000, 0x00007ff0dd000000) eden space 8126528K, 93% used [0x00007fee47ab0000,0x00007ff0177ef080,0x00007ff037ac0000) from space 1354368K, 0% used [0x00007ff037ac0000,0x00007ff037ac0000,0x00007ff08a560000) to space 1354368K, 0% used [0x00007ff08a560000,0x00007ff08a560000,0x00007ff0dd000000) ParOldGen total 21670592K, used 21670588K [0x00007fe91d000000, 0x00007fee47ab0000, 0x00007fee47ab0000) object space 21670592K, 99% used [0x00007fe91d000000,0x00007fee47aaf0e0,0x00007fee47ab0000) PSPermGen total 65536K, used 58512K [0x00007fe915000000, 0x00007fe919000000, 0x00007fe91d000000) object space 65536K, 89% used [0x00007fe915000000,0x00007fe918924130,0x00007fe919000000) }

Page 42: Living With Garbage

GC Log Analyzers?

GCHisto

GCViewer

garbagecat

Page 43: Living With Garbage

Graphing with Logster

github.com/etsy/logster

Page 44: Living With Garbage
Page 45: Living With Garbage

GC Dashboardgithub.com/etsy/dashboard

Page 46: Living With Garbage
Page 47: Living With Garbage

YourKit.com

Page 48: Living With Garbage

Designing for Partial Availability

Page 49: Living With Garbage

JVMTI GC Hook?

Page 50: Living With Garbage

How can a client ignore GC-ing hosts?

Page 51: Living With Garbage

Server lies to clients about availability

TCP socket receive buffer

TCP write buffer

Page 52: Living With Garbage

“Banner” protocol1. Connect via TCP

2. Wait ~1-10ms

3. Either receive magic four byte header or try another host

4. Only send query after receiving header from server

Page 53: Living With Garbage

0xC0DEA5CF

Page 54: Living With Garbage

What if GC happens mid-request?

Page 55: Living With Garbage

Backup requests

Page 56: Living With Garbage

Jeff Dean: Achieving Rapid Response Time in Large

Online Services

Page 57: Living With Garbage

Solr sharding?

Right now, only as fast as the slowest shard.

Page 58: Living With Garbage

“Make a reliable whole out of unreliable parts.”

Page 59: Living With Garbage

Memory Leaks

Page 60: Living With Garbage

Solr API hooks for custom code

QParserPlugin SearchComponent

SolrRequestHandler SolrEventListenerQParserPlugin

SolrCache ValueSourceParser

etc.FieldType

Page 61: Living With Garbage

QParserPluginPSA: Are you sure you need custom code?

Page 62: Living With Garbage

CoreContainer#getCore()

RefCounted<SolrIndexSearcher>

Page 63: Living With Garbage

SolrIndexSearcher generation marking with YourKit triggers

Page 64: Living With Garbage
Page 65: Living With Garbage

Questions so far?

Page 66: Living With Garbage

Miscellaneous Topics

Page 67: Living With Garbage

System.gc()?

Page 68: Living With Garbage

-XX:+UseCompressedOops

Page 69: Living With Garbage

-XX:+UseNUMA

Page 70: Living With Garbage

Paging

Page 71: Living With Garbage

#!/usr/bin/env bash !# This script is designed to be run every minute by cron. !host=$(hostname -s) !psout=$(ps h -p `cat /var/run/etsy-search.pid` -o min_flt,maj_flt 2>/dev/null) min_flt=$(echo $psout | awk '{print $1}') # minor page faults maj_flt=$(echo $psout | awk '{print $2}') # major page faults !epoch_s=$(date +%s) !echo -e "search_memstats.$host.etsy-search.min_flt\t${min_flt:-0}\t$epoch_s" | nc graphite.etsycorp.com 2003 echo -e "search_memstats.$host.etsy-search.maj_flt\t${maj_flt:-0}\t$epoch_s" | nc graphite.etsycorp.com 2003

Page 72: Living With Garbage

Solution 1: Buy more RAM

Ideally enough RAM to: Keep index in OS file buffers AND ensure no paging of VM memory AND whatever else happens on the box

~$5-10/GB

Page 73: Living With Garbage

echo “0” > /proc/sys/vm/swappiness

Page 74: Living With Garbage

mlock()/mlockall()github.com/LucidWorks/mlockall-agent

Page 75: Living With Garbage

echo “-17” > /proc/$PID/oom_adj

Mercy from the OOM Killer

Page 76: Living With Garbage

Huge Pages

Page 77: Living With Garbage

-XX:+AlwaysPreTouch

Page 78: Living With Garbage

Possible Future Directions

Page 79: Living With Garbage

Many small VMs instead of one large VM

microsharding

Page 80: Living With Garbage

In-memory Lucene codecs

I.e. custom DirectPostingsFormat

Page 81: Living With Garbage

Off-heap memory with sun.misc.Unsafe?

Page 82: Living With Garbage

Try G1 again

Page 83: Living With Garbage

Try C4 again

Page 84: Living With Garbage

Resources

Page 85: Living With Garbage

gchandbook.org

Page 86: Living With Garbage
Page 87: Living With Garbage

bit.ly/mmgcb

Mark Miller’s GC Bootcamp

Page 88: Living With Garbage

bit.ly/giltene

Gil Tene: Understanding Java Garbage Collection

Page 89: Living With Garbage

bit.ly/cpumemory

Ulrich Drepper: What Every Programmer Should Know About Memory

Page 90: Living With Garbage

github.com/pingtimeout/jvm-options

Page 91: Living With Garbage

Read the JVM Source(Not as scary as it sounds.)

hg.openjdk.java.net/jdk7/jdk7

Page 92: Living With Garbage

Mechanical Sympathy Google Group

bit.ly/mechsym

Page 93: Living With Garbage

Questions?

Thanks for coming!

[email protected]