Top Banner
Designing for Garbage Collection Gregg Donovan Senior Software Engineer Etsy.com Wednesday, July 31, 13
97
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Designing for garbage collection

Designing for Garbage Collection

Gregg DonovanSenior Software Engineer

Etsy.com

Wednesday, July 31, 13

Page 2: Designing for garbage collection

3.5 Years Search Engineering at Etsy.com

5 years Search & Web Engineeringat TheLadders.com

Wednesday, July 31, 13

Page 3: Designing for garbage collection

Wednesday, July 31, 13

Page 4: Designing for garbage collection

25+ million members

Wednesday, July 31, 13

Page 5: Designing for garbage collection

20+ million items

Wednesday, July 31, 13

Page 6: Designing for garbage collection

900k+ active sellers

Wednesday, July 31, 13

Page 7: Designing for garbage collection

60+ million monthly unique visitors

Wednesday, July 31, 13

Page 8: Designing for garbage collection

Wednesday, July 31, 13

Page 9: Designing for garbage collection

Wednesday, July 31, 13

Page 10: Designing for garbage collection

Wednesday, July 31, 13

Page 11: Designing for garbage collection

Wednesday, July 31, 13

Page 12: Designing for garbage collection

Wednesday, July 31, 13

Page 13: Designing for garbage collection

Wednesday, July 31, 13

Page 14: Designing for garbage collection

Wednesday, July 31, 13

Page 15: Designing for garbage collection

CodeAsCraft.etsy.comWednesday, July 31, 13

Page 16: Designing for garbage collection

Wednesday, July 31, 13

Page 17: Designing for garbage collection

Understanding GC

Wednesday, July 31, 13

Page 18: Designing for garbage collection

Understanding GCMonitoring GC

Wednesday, July 31, 13

Page 19: Designing for garbage collection

Understanding GCMonitoring GC

Debugging Memory Leaks

Wednesday, July 31, 13

Page 20: Designing for garbage collection

Understanding GCMonitoring GC

Debugging Memory LeaksDesign for Partial Availability

Wednesday, July 31, 13

Page 21: Designing for garbage collection

Wednesday, July 31, 13

Page 22: Designing for garbage collection

public class BuzzwordDetector { static String[] prefixes = { "synergy", "win-win" }; static String[] myArgs = { "clown synergy", "gorilla win-wins", "whamee" };

public static void main(String[] args) { args = myArgs;

int buzzwords = 0; for (int i = 0; i < args.length; i++) { String lc = args[i].toLowerCase(); for (int j = 0; j < prefixes.length; j++) { if (lc.contains(prefixes[j])) { buzzwords++; } } } System.out.println("Found " + buzzwords + " buzzwords"); }}

Wednesday, July 31, 13

Page 23: Designing for garbage collection

New(): ref <- allocate() if ref = null /* Heap is full */ collect() ref <- allocate() if ref = null /* Heap is still full */ error "Out of memory" return ref atomic collect(): markFromRoots() sweep(HeapStart, HeapEnd)

From Garbage Collection HandbookWednesday, July 31, 13

Page 24: Designing for garbage collection

markFromRoots(): initialise(worklist) for each fld in Roots ref <- *fld if ref != null && not isMarked(ref) setMarked(ref) add(worklist, ref) mark() initialise(worklist): worklist <- empty mark(): while not isEmpty(worklist) ref <- remove(worklist) /* ref is marked */ for each fld in Pointers(ref) child <- *fld if (child != null && not isMarked(child) setMarked(child) add(worklist, child)

From Garbage Collection HandbookWednesday, July 31, 13

Page 25: Designing for garbage collection

Trivia: Who invented the first GC and Mark-and-Sweep?

Wednesday, July 31, 13

Page 26: Designing for garbage collection

Weak Generational Hypothesis

Wednesday, July 31, 13

Page 27: Designing for garbage collection

Where do objects in your application live?

Wednesday, July 31, 13

Page 28: Designing for garbage collection

GC Terminology:Concurrent vs Parallel

Wednesday, July 31, 13

Page 29: Designing for garbage collection

JVM Collectors

Wednesday, July 31, 13

Page 30: Designing for garbage collection

Serial

Wednesday, July 31, 13

Page 31: Designing for garbage collection

Throughput

Wednesday, July 31, 13

Page 32: Designing for garbage collection

CMS

Wednesday, July 31, 13

Page 33: Designing for garbage collection

Garbage First (G1)

Wednesday, July 31, 13

Page 34: Designing for garbage collection

Continuously Concurrent Compacting Collector (C4)

Wednesday, July 31, 13

Page 35: Designing for garbage collection

IBM, Dalvik, etc.?

Wednesday, July 31, 13

Page 36: Designing for garbage collection

Why Throughput?

Wednesday, July 31, 13

Page 37: Designing for garbage collection

Questions so far?

Wednesday, July 31, 13

Page 38: Designing for garbage collection

Monitoring

Wednesday, July 31, 13

Page 39: Designing for garbage collection

GC time per request

Wednesday, July 31, 13

Page 40: Designing for garbage collection

...import java.lang.management.*;...

public static long getCollectionTime() { long collectionTime = 0; for (GarbageCollectorMXBean mbean : ManagementFactory.getGarbageCollectorMXBeans()) { collectionTime += mbean.getCollectionTime(); } return collectionTime; }

Available via JMX

Wednesday, July 31, 13

Page 41: Designing for garbage collection

Wednesday, July 31, 13

Page 42: Designing for garbage collection

Visual GC

Wednesday, July 31, 13

Page 43: Designing for garbage collection

Wednesday, July 31, 13

Page 44: Designing for garbage collection

Wednesday, July 31, 13

Page 45: Designing for garbage collection

export GC_DEBUG="-verbose:gc \-XX:+PrintGCDateStamps \-XX:+PrintHeapAtGC \-XX:+PrintGCApplicationStoppedTime \-XX:+PrintGCApplicationConcurrentTime \-XX:+PrintAdaptiveSizePolicy \-XX:AdaptiveSizePolicyOutputInterval=1 \-XX:+PrintTenuringDistribution \-XX:+PrintGCDetails \-XX:+PrintCommandLineFlags \-XX:+PrintSafepointStatistics \-Xloggc:/var/log/search/gc.log"

Wednesday, July 31, 13

Page 46: Designing for garbage collection

Wednesday, July 31, 13

Page 47: Designing for garbage collection

2013-04-08T20:14:00.162+0000: 4197.791: [Full GCAdaptiveSizeStart: 4206.559 collection: 213 PSAdaptiveSizePolicy::compute_generation_free_space limits: desired_promo_size: 9927789154 promo_limit: 8321564672 free_in_old_gen: 4096 max_old_gen_size: 22190686208 avg_old_live: 22190682112AdaptiveSizePolicy::compute_generation_free_space limits: desired_eden_size: 9712028790 old_eden_size: 8321564672 eden_limit: 8321564672 cur_eden: 8321564672 max_eden_size: 8321564672 avg_young_live: 7340911616AdaptiveSizePolicy::compute_generation_free_space: gc time limit gc_cost: 1.000000 GCTimeLimit: 98PSAdaptiveSizePolicy::compute_generation_free_space: costs minor_time: 0.167092 major_cost: 0.965075 mutator_cost: 0.000000 throughput_goal: 0.990000 live_space: 29859940352 free_space: 16643129344 old_promo_size: 8321564672 old_eden_size: 8321564672 desired_promo_size: 8321564672 desired_eden_size: 8321564672AdaptiveSizeStop: collection: 213 [PSYoungGen: 8126528K->7599356K(9480896K)] [ParOldGen: 21670588K->21670588K(21670592K)] 29797116K->29269944K(31151488K) [PSPermGen: 58516K->58512K(65536K)], 8.7690670 secs] [Times: user=137.36 sys=0.03, real=8.77 secs] Heap after GC invocations=213 (full 210): PSYoungGen total 9480896K, used 7599356K [0x00007fee47ab0000, 0x00007ff0dd000000, 0x00007ff0dd000000) eden space 8126528K, 93% used [0x00007fee47ab0000,0x00007ff0177ef080,0x00007ff037ac0000) from space 1354368K, 0% used [0x00007ff037ac0000,0x00007ff037ac0000,0x00007ff08a560000) to space 1354368K, 0% used [0x00007ff08a560000,0x00007ff08a560000,0x00007ff0dd000000) ParOldGen total 21670592K, used 21670588K [0x00007fe91d000000, 0x00007fee47ab0000, 0x00007fee47ab0000) object space 21670592K, 99% used [0x00007fe91d000000,0x00007fee47aaf0e0,0x00007fee47ab0000) PSPermGen total 65536K, used 58512K [0x00007fe915000000, 0x00007fe919000000, 0x00007fe91d000000) object space 65536K, 89% used [0x00007fe915000000,0x00007fe918924130,0x00007fe919000000)}

Wednesday, July 31, 13

Page 48: Designing for garbage collection

GC Log Analyzers?

GCHisto

GCViewer

garbagecat

github.com/Netflix/gcvizWednesday, July 31, 13

Page 49: Designing for garbage collection

Graphing with Logster

github.com/etsy/logster

Wednesday, July 31, 13

Page 50: Designing for garbage collection

Wednesday, July 31, 13

Page 51: Designing for garbage collection

GC Dashboardgithub.com/etsy/dashboard

Wednesday, July 31, 13

Page 52: Designing for garbage collection

Wednesday, July 31, 13

Page 53: Designing for garbage collection

YourKit.com

Wednesday, July 31, 13

Page 54: Designing for garbage collection

Designing for Partial Availability

Wednesday, July 31, 13

Page 55: Designing for garbage collection

JVMTI GC Hook?

Wednesday, July 31, 13

Page 56: Designing for garbage collection

How can a client ignore GC-ing hosts?

Wednesday, July 31, 13

Page 57: Designing for garbage collection

Server lies to clients about availability

TCP socket receive buffer

TCP write buffer

Wednesday, July 31, 13

Page 58: Designing for garbage collection

“Banner” protocol1. Connect via TCP

2. Wait ~1-10ms

3. Either receive magic four byte header or try another host

4. Only send query after receiving header from server

Wednesday, July 31, 13

Page 59: Designing for garbage collection

0xC0DEA5CF

Wednesday, July 31, 13

Page 60: Designing for garbage collection

public function open() { $this->handle_ = @fsockopen($this->host_, $this->port_, $errno, $errstr, $this->connectTimeout_ / 1000.0); try { stream_set_timeout($this->handle_, 0, $banner_timeout * 1000); $read_start = microtime(true); $data = $this->readAll(4); $read_time = (microtime(true) - $read_start) * 1000; // micros to millis $arr = unpack('N', $data); $value = $arr[1]; if ($value !== 0xC0DEA5CF) { StatsD::increment("search.baddata.{$short_hostname}.{$this->getPort()}"); throw new TTransportException("[$value] does match banner [0xC0DEA5CF]"); } } catch (Exception $e) { $this->close(); // this won't necessarily be closed by clients throw new TTransportException($message, self::BANNER_TIMEOUT_CODE); }}

Wednesday, July 31, 13

Page 61: Designing for garbage collection

private static class BannerSendingTProcessorFactory extends TProcessorFactory { private final TProcessor base; public BannerSendingTProcessorFactory(TProcessor base) { super(base); this.base = base; }

@Override public TProcessor getProcessor(TTransport trans) { return new BannerTProcessor(base, (TSocket) trans); }}

private static final class BannerTProcessor implements TProcessor { private final TProcessor base; private final TSocket tsocket;

private BannerTProcessor(TProcessor base, TSocket tsocket) { this.base = checkNotNull(base); this.tsocket = checkNotNull(tsocket); }

@Override public boolean process(TProtocol in, TProtocol out) throws TException { this.tsocket.write(TBannerUtil.BANNER, 0, 4); this.tsocket.flush(); return this.base.process(in, out); }}

Wednesday, July 31, 13

Page 62: Designing for garbage collection

What if GC happens mid-request?

Wednesday, July 31, 13

Page 63: Designing for garbage collection

Backup requests

Wednesday, July 31, 13

Page 64: Designing for garbage collection

Jeff Dean: Achieving Rapid Response Time in Large

Online Services

Wednesday, July 31, 13

Page 65: Designing for garbage collection

Sharding?

Naive approach: only as fast as the slowest shard.

Wednesday, July 31, 13

Page 66: Designing for garbage collection

“Make a reliable whole out of unreliable parts.”

Wednesday, July 31, 13

Page 67: Designing for garbage collection

Memory Leaks

Wednesday, July 31, 13

Page 68: Designing for garbage collection

SolrIndexSearcher generation marking with YourKit triggers

Wednesday, July 31, 13

Page 69: Designing for garbage collection

Wednesday, July 31, 13

Page 70: Designing for garbage collection

Questions so far?

Wednesday, July 31, 13

Page 71: Designing for garbage collection

Miscellaneous Topics

Wednesday, July 31, 13

Page 72: Designing for garbage collection

System.gc()?

Wednesday, July 31, 13

Page 73: Designing for garbage collection

-XX:+UseCompressedOops

Wednesday, July 31, 13

Page 74: Designing for garbage collection

-XX:+UseNUMA

Wednesday, July 31, 13

Page 75: Designing for garbage collection

Paging

Wednesday, July 31, 13

Page 76: Designing for garbage collection

#!/usr/bin/env bash

# This script is designed to be run every minute by cron.

host=$(hostname -s)

psout=$(ps h -p `cat /var/run/etsy-search.pid` -o min_flt,maj_flt 2>/dev/null)min_flt=$(echo $psout | awk '{print $1}') # minor page faultsmaj_flt=$(echo $psout | awk '{print $2}') # major page faults

epoch_s=$(date +%s)

echo -e "search_memstats.$host.etsy-search.min_flt\t${min_flt:-0}\t$epoch_s" | nc graphite.etsycorp.com 2003echo -e "search_memstats.$host.etsy-search.maj_flt\t${maj_flt:-0}\t$epoch_s" | nc graphite.etsycorp.com 2003

Wednesday, July 31, 13

Page 77: Designing for garbage collection

Solution 1: Buy more RAM

Ideally enough RAM to:Keep data in OS file buffersAND ensure no paging of VM memory AND whatever else happens on the box

~$5-10/GB

Wednesday, July 31, 13

Page 78: Designing for garbage collection

echo “0” > /proc/sys/vm/swappiness

Wednesday, July 31, 13

Page 79: Designing for garbage collection

mlock()/mlockall()

github.com/LucidWorks/mlockall-agent

Wednesday, July 31, 13

Page 80: Designing for garbage collection

echo “-17” > /proc/$PID/oom_adj

Mercy from the OOM Killer

Wednesday, July 31, 13

Page 81: Designing for garbage collection

Huge Pages

Wednesday, July 31, 13

Page 82: Designing for garbage collection

-XX:+AlwaysPreTouch

Wednesday, July 31, 13

Page 83: Designing for garbage collection

Future Directions

Wednesday, July 31, 13

Page 84: Designing for garbage collection

Many small VMs instead of one large VM

microsharding

Wednesday, July 31, 13

Page 85: Designing for garbage collection

Off-heap memory with sun.misc.Unsafe?

Wednesday, July 31, 13

Page 86: Designing for garbage collection

Try G1 again

Wednesday, July 31, 13

Page 87: Designing for garbage collection

Try C4 again

Wednesday, July 31, 13

Page 88: Designing for garbage collection

Resources

Wednesday, July 31, 13

Page 89: Designing for garbage collection

gchandbook.org

Wednesday, July 31, 13

Page 90: Designing for garbage collection

Wednesday, July 31, 13

Page 91: Designing for garbage collection

bit.ly/mmgcb

Mark Miller’s GC Bootcamp

Wednesday, July 31, 13

Page 92: Designing for garbage collection

bit.ly/giltene

Gil Tene: Understanding Java Garbage Collection

Wednesday, July 31, 13

Page 93: Designing for garbage collection

bit.ly/cpumemory

Ulrich Drepper: What Every Programmer Should Know About Memory

Wednesday, July 31, 13

Page 94: Designing for garbage collection

github.com/pingtimeout/jvm-optionsWednesday, July 31, 13

Page 95: Designing for garbage collection

Read the JVM Source(Not as scary as it sounds.)

hg.openjdk.java.net/jdk7/jdk7

Wednesday, July 31, 13

Page 96: Designing for garbage collection

Mechanical Sympathy Google Group

bit.ly/mechsym

Wednesday, July 31, 13

Page 97: Designing for garbage collection

Questions?

Thanks for coming!

Gregg [email protected]

Wednesday, July 31, 13