Top Banner
Yonik Seeley Lucene/Solr Revolution 2014 Washington, D.C. Native Code & Off-Heap Data Structures for Solr
43

Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, Heliosearch

Jul 12, 2015

Download

Software

LucidWorks
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Yonik Seeley

    Lucene/Solr Revolution 2014 Washington, D.C.

    Native Code & Off-Heap Data Structures for Solr

  • My Background Creator of Solr Heliosearch Founder LucidWorks Co-Founder Lucene/Solr committer, PMC member Apache Software Foundation member M.S. in Computer Science, Stanford

  • Heliosearch Project The Next Evolution of Solr Forked from Solr, Developing at github

    Started Jan 2014 Well aligned community Open Source, Apache licensed

    Bring back to Apache in the future? Currently drop-in replacement for Solr at the HTTP-API level

    A super-set we continually merge in upstream changes Latest version of Heliosearch includes latest Solr

    Current Features: Off-heap filters, Off-heap fieldcache, facet-by-function, sub-facets, native code performance enhancements

  • Garbage Collection

  • Garbage Collection Basics Eden Space

    Survivor Space 1

    Survivor Space 2

    Tenured Space

    Permanent Space

    q New objects allocated in Eden q Find live objects by tracing from GC

    roots (threads, stack locals, etc) q Make a copy of live objects, leaving

    garbage behind q Eden + Survivor Space copied

    together to other Survivor space q Tenured from Survivor when old

    enough q stop-the-world needed when GC

    cant keep up q Out of memory when too much time

    spent in GC

    Thread

  • Java Memory Waste - Need to size for worst case scenario - OS needs free memory to cache index files - JVMs arent good at sharing with rest of the system - mmap allocations managed by OS, can be immediately reused on free

    OS Real Memory

    Heap in use

    Unused Heap

    max heap

    JVM

    Heap in use

    Unused Heap

    max heap

    JVM

    C Heap in use Unused Heap

    C Process

    C Heap in use Unused Heap

    C Process

    mmap alloced mmap alloced

    Free Memory includes buer cache, important to cache index les

  • GC Impact qGC Reduces Throughput qTime to copy all that memory around could be spent

    better! qStop-the-world pauses qSeconds to Minutes long qPause time proportional to heap size qStill exists in all Hotspot GCs CMS, G1GC, etc qBreaks Application SLAs (request timeouts, etc) qCan cause SolrCloud Zookeeper session timeouts

    qReducing max pause size normally means reduced throughput

    qNon-graceful degradation qif you don't size your heap big enough BOOM!

  • GC Tuning UseSerialGC UseParallelGC UseParallelOldGC UseParallelOldGCCompacting UseParallelDensePrefixUpdate HeapMaximumCompactionInterval HeapFirstMaximumCompactionCount UseMaximumCompactionOnSystemGC ParallelOldDeadWoodLimiterMean ParallelOldDeadWoodLimiterStdDev UseParallelOldGCDensePrefix ParallelGCThreads ParallelCMSThreads YoungPLABSize OldPLABSize GCTaskTimeStampEntries AlwaysTenure NeverTenure ScavengeBeforeFullGC UseConcMarkSweepGC ExplicitGCInvokesConcurrent UseCMSBestFit UseCMSCollectionPassing UseParNewGC ParallelGCVerbose ParallelGCBufferWastePct ParallelGCRetainPLAB TargetPLABWastePct PLABWeight ResizePLAB PrintPLAB ParGCArrayScanChunk ParGCDesiredObjsFromOverflowList CMSParPromoteBlocksToClaim AlwaysPreTouch CMSUseOldDefaults CMSYoungGenPerWorker CMSIncrementalMode CMSIncrementalDutyCycle CMSIncrementalPacing CMSIncrementalDutyCycleMin CMSIncrementalSafetyFactor CMSIncrementalOffset CMSExpAvgFactor CMS_FLSWeight CMS_FLSPadding FLSCoalescePolicy CMS_SweepWeight CMS_SweepPadding CMS_SweepTimerThresholdMillis CMSClassUnloadingEnabled CMSCompactWhenClearAllSoftRefs UseCMSCompactAtFullCollection CMSFullGCsBeforeCompaction CMSIndexedFreeListReplenish CMSLoopWarn CMSMarkStackSize CMSMarkStackSizeMax CMSMaxAbortablePrecleanLoops CMSMaxAbortablePrecleanTime CMSAbortablePrecleanMinWorkPerIteration CMSAbortablePrecleanWaitMillis CMSRescanMultiple CMSConcMarkMultiple CMSRevisitStackSize CMSAbortSemantics CMSParallelRemarkEnabled CMSParallelSurvivorRemarkEnabled CMSPLABRecordAlways CMSConcurrentMTEnabled CMSPermGenPrecleaningEnabled CMSPermGenSweepingEnabled

    CMSPrecleaningEnabled CMSPrecleanIter CMSPrecleanNumerator CMSPrecleanDenominator CMSPrecleanRefLists1 CMSPrecleanRefLists2 CMSPrecleanSurvivors1 CMSPrecleanSurvivors2 CMSPrecleanThreshold CMSCleanOnEnter CMSRemarkVerifyVariant CMSScheduleRemarkEdenSizeThreshold CMSScheduleRemarkEdenPenetration CMSScheduleRemarkSamplingRatio CMSSamplingGrain CMSScavengeBeforeRemark CMSWorkQueueDrainThreshold CMSWaitDuration CMSYield CMSBitMapYieldQuantum UseGCLogFileRotation NumberOfGCLogFiles GCLogFileSize LargePageSizeInBytes LargePageHeapSizeThreshold PrintGCApplicationConcurrentTime PrintGCApplicationStoppedTime OnOutOfMemoryError ClassUnloading BlockOffsetArrayUseUnallocatedBlock RefDiscoveryPolicy ParallelRefProcEnabled CMSTriggerRatio CMSBootstrapOccupancy CMSInitiatingOccupancyFraction UseCMSInitiatingOccupancyOnly HandlePromotionFailure PreserveMarkStackSize ZeroTLAB PrintTLAB TLABStats AlwaysActAsServerClassMachine DefaultMaxRAM DefaultMaxRAMFraction DefaultInitialRAMFraction UseAutoGCSelectPolicy AutoGCSelectPauseMillis UseAdaptiveSizePolicy UsePSAdaptiveSurvivorSizePolicy UseAdaptiveGenerationSizePolicyAtMinorCollection UseAdaptiveGenerationSizePolicyAtMajorCollection UseAdaptiveSizePolicyWithSystemGC UseAdaptiveGCBoundary AdaptiveSizeThroughPutPolicy AdaptiveSizePausePolicy AdaptiveSizePolicyInitializingSteps AdaptiveSizePolicyOutputInterval UseAdaptiveSizePolicyFootprintGoal AdaptiveSizePolicyWeight AdaptiveTimeWeight PausePadding PromotedPadding SurvivorPadding AdaptivePermSizeWeight PermGenPadding ThresholdTolerance AdaptiveSizePolicyCollectionCostMargin YoungGenerationSizeIncrement YoungGenerationSizeSupplement YoungGenerationSizeSupplementDecay TenuredGenerationSizeIncrement TenuredGenerationSizeSupplement TenuredGenerationSizeSupplementDecay

    MaxGCPauseMillis MaxGCMinorPauseMillis GCTimeRatio AdaptiveSizeDecrementScaleFactor UseAdaptiveSizeDecayMajorGCCost AdaptiveSizeMajorGCDecayTimeScale MinSurvivorRatio InitialSurvivorRatio BaseFootPrintEstimate UseGCOverheadLimit GCTimeLimit GCHeapFreeLimit PrintAdaptiveSizePolicy DisableExplicitGC CollectGen0First BindGCTaskThreadsToCPUs UseGCTaskAffinity ProcessDistributionStride CMSCoordinatorYieldSleepCount CMSYieldSleepCount PrintGCTaskTimeStamps TraceClassLoadingPreorder TraceGen0Time TraceGen1Time PrintTenuringDistribution PrintHeapAtSIGBREAK TraceParallelOldGCTasks PrintParallelOldGCPhaseTimes MaxHeapSize MaxNewSize PretenureSizeThreshold MinTLABSize TLABAllocationWeight TLABWasteTargetPercent TLABRefillWasteFraction TLABWasteIncrement MaxLiveObjectEvacuationRatio OldSize MinHeapFreeRatio MaxHeapFreeRatio SoftRefLRUPolicyMSPerMB MinHeapDeltaBytes MinPermHeapExpansion MaxPermHeapExpansion QueuedAllocationWarningCount MaxTenuringThreshold InitialTenuringThreshold TargetSurvivorRatio MarkSweepDeadRatio PermMarkSweepDeadRatio MarkSweepAlwaysCompactCount PrintCMSStatistics PrintCMSInitiationStatistics PrintFLSStatistics PrintFLSCensus DeferThrSuspendLoopCount DeferPollingPageLoopCount SafepointSpinBeforeYield UseDepthFirstScavengeOrder GCDrainStackTargetSize ThreadSafetyMargin CodeCacheMinimumFreeSpace MaxDirectMemorySize PerfDataMemorySize AggressiveHeap UseCompressedStrings UseStringCache HeapDumpOnOutOfMemoryError HeapDumpPath PrintGC PrintGCDetails PrintGCTimeStamps PG1HeapRegionSize G1ReservePercent G1ConfidencePercent PrintPromotionFailure PrintGCDateStamps

    -XX:IniKaKngHeapOccupancyPercent=n

    -XX:MaxHeapFreeRaKo=70

    -XX:MaxGCPauseMillis=n

    -XX:+ScavengeBeforeFullGC

    -XX:ConcGCThreads=n

    -XX:MaxTenuringThreshold=n

  • GC Reduction qReuse objects cause less garbage qMove certain things off-heap (invisible to GC) qOption1: Direct ByteBuffers qLimited to int (2GB) qNo way to directly free still relies on GC

    qOption2: sun.misc.Unsafe qmalloc() + free() + direct memory access qSupported on all major JVMs qWidely used: Java (nio, concurrent),JSR166, Google

    Guava, objenesis (which is used in Kyro, which is used in Twitter Storm), Apache DirectMemory,Lightning, Hazelcast, snappy, gson,

    qBeing considered for Java 9

  • Off-Heap Filters 50M docs (3.8 GB index)

    8GB RAM 20K requests 8 req threads 500 filters JVM Options: -Xmx4G (solr)

  • title Off-Heap Filters Test

    Observed max process sizes Solr : 3.8GB 4.3GB Heliosearch: 3.6GB 3.7GB

  • Off-Heap FieldCache Normal (on-heap) FieldCache q Typically the largest data structures kept on the heap q Used for sorting, function query values, single-valued faceting,

    grouping q Uses weak references Heliosearch nCache (n is for native) q Allocated off-heap q First-class managed Solr cache

    qConfigure size, warming policies qView statistics

    q Per-segment (NRT friendly) q No weak references

  • nCache admin stats

    item_id:{ "field":"id", "uses":8, "class":"StrTopValues", "refcount":2, "numSegments":7, "carriedOver":6, "size":612} item_popularity:{ "field":"popularity", "uses":5, "class":"IntTopValues", "refcount":2, "numSegments":7, "carriedOver":6, "size":106} item_price:{ "field":"price, "uses":0, -- the number of top-level uses for searcher "class":"FloatTopValues", "refcount":2, "numSegments":5, -- number of segments populated "carriedOver":5, -- number of segments carried over from last searcher "size":272 -- size in bytes for all populated segments }

  • Off-Heap Integer Field q 50M document index q Sorting on 6 different integer fields (10,100,1000,10000,1M unique values) q 4 request threads Results q 42% faster sorting q 73% faster functions

  • String Field Sorting q 10M document index q 10 different string fields, each field 80% populated q Median latency

  • String Field Sorting Throughput q Concurrent throughput sorting on random fields in random order (asc/desc) q ~50% performance gain

  • Native Code

  • Native Code qThe Idea: create native accelerators for CPU hotspots qFaceting anyone?

    qBut. JNI Sucks! (and its GCs fault again)

    qGetArrayElements() makes a *copy* of the array! qGetPrimitiveArrayCritical() blocks garbage collection! qTons of other restrictions its a critical section

    qDefeats the purpose of going to native code in the first place qBut our data is already off-heap, were good!

    jint *buf= (*env)->GetIntArrayElements(env, arr, 0); for (i=0; i

  • Native Single Valued String Faceting

    qTop-Level off-heap String cache qImproves Sorting and Faceting speed qEliminates FieldCache insanity

    qNative Code qWritten in C++, compiled with GCC 4.7, 4.8 qCurrently supports 64 bit Windows, OS-X, Linux (x86) qstatic compilation avoids JVM hotspot warmup period,

    mis-compilation bugs, and variations between runs

  • Native Faceting Performance

  • Terms Query Optimization

  • New Facet Module

  • Facet Module Goals qReplace the aging SimpleFacets qFirst class JSON support qEasier programmatic construction of complex nested facet

    commands qCanonical response format that is easier for clients to

    parse qFirst class analytics support qCleaner distributed search support qFully pluggable qBetter base for integration of other search features Heliosearch is a Solr super-set, so you can still chose to use the old faceting or mix-n-match.

  • &facet=true

    &facet.range={!key=age_ranges}age

    &f.age_ranges.facet.range.start=0

    &f.age_ranges.facet.range.end=100

    &f.age_ranges.facet.range.gap=10

    &facet.range={!key=price_ranges}price

    &f.price_ranges.facet.range.start=0

    &f.price_ranges.facet.range.end=1000

    &f.price_ranges.facet.range.gap=50

    { age_ranges: { // facet name range: { // facet type eld : age, // facet params start : 0, end : 100, gap : 10 } }, price_ranges: { range: { eld : price, start : 0, end : 1000, gap : 50 } } }

    API Comparison Old Style New JSON API

  • Facet Functions qSort/Report by things other than count Aggregation Functions / Stats: qStats are calculated per bucket qBuckets created by Query, Range, or Terms (field) facets

    count sum(function) avg(function) sumsq(function) min(function) max(function) unique(string_field)

    any funcKon query that yields a numeric value!

    Example: sum(mul(num_units, unit_price))

  • $ curl http://localhost:8983/solr/query -d 'q=widgets& json.facet= { // Comments can help with clarity /* traditional C-style comments are also supported */ x : "avg(price)" , // Simple strings can occur unquoted y : 'unique(brand)' // Strings can also use single quotes } '

    [] "facets" : { "count" : 314, "x" : 102.5, "y" : 28 }

    Number of documents in the facet bucket

    Simple Request + Response

  • Terms Facet Example json.facet={ shoes:{ terms:{ field: shoe_style, sort: {x : desc}, facet:{ x : "avg(price)", y : "unique(brand)" } } } }

    "facets": { "count" : 472, "shoes": { "buckets" : [ { "val" : "Hiking", "count" : 34, "x" : 135.25, "y" : 17, }, { "val" : "Running", "count" : 45, "x" : 110.75, "y" : 24, },

    Executed per-bucket

  • Sub-Facets qAny facet that produces buckets can have sub-

    facets (terms/field, range, query) qSub-facets can have facet functions (stats) or their

    own sub-facets (no limit to nesting). qA subfacet can be any type (field, range, query) qMultiple subfacets can be added to any given facet qSubfacets are first-class facets - can be configured

    independently like any other facet. qDifferent offsets, limits, stats, sorts, etc

  • Sub-Facet Example json.facet={ shoes:{ terms:{ field: shoe_style, sort: {x : desc}, facet:{ x : "avg(price)", y : "unique(brand)", colors :{terms:color} } } } }

    "facets": { "count" : 472, "shoes": { "buckets" : [ { "val" : "Hiking", "count" : 34, "x" : 135.25, "y" : 17, "colors" : { "buckets" : [ { "val" : "brown", "count" : 12 }, { "val" : "black", "count" : 10 }, [] ] } // end of colors sub-facet }, // end of Hiking bucket { "val" : "Running", "count" : 45, "x" : 110.75, "y" : 24, "colors" : { "buckets" : []

    Short-form for terms facet simply species the eld. Sorts buckets

    by count descending.

  • Terms Facet Terms facet creates buckets of docs with the same value in a field - field The field name to facet over. - offset Used for paging, this skips the first N buckets. Defaults to 0. - limit Limits the number of buckets returned. Defaults to 10. - mincount Only return buckets with a count of at least this number. Defaults to 1. - sort Specifies how to sort the buckets produced. count specifies document count,

    index sorts by the index (natural) order of the bucket value. One can also sort by any facet function / statistic that occurs in the bucket. The default is count desc. This parameter may also be specified in JSON like sort:{count:desc}. The sort order may either be asc or desc

    - missing A boolean that specifies if a special missing bucket should be returned that is defined by documents without a value in the field. Defaults to false.

    - numBuckets A boolean. If true, adds numBuckets to the response, an integer representing the number of buckets for the facet (as opposed to the number of buckets returned). Defaults to false.

    - allBuckets A boolean. If true, adds an allBuckets bucket to the response, representing the union of all of the buckets. For multi-valued fields, this is different than a bucket for all of the documents in the domain since a single document can belong to multiple buckets. Defaults to false.

    - prefix Only produce buckets for terms starting with the specified prefix.

  • Query Facet Query facet creates a single bucket of documents matching the query.

    { // simple example highpop:{ query:{ q:"inStock:true AND popularity[8 TO 10]" } } }

    { // example with multiple sub-facets highpop:{ query:{ q : "inStock:true AND popularity[8 TO 10]", facet : { average_price : "agv(price)", available_colors : { terms : color }, price_ranges : { range : { field:price, start:0, end:200, gap:10 }} }} }

  • Range Facet Creates buckets over ranges on a numeric or date field Parameter names/values "in sync" with Solr range parameters: field The numeric field or date field to produce range buckets from start Lower bound of the ranges end Upper bound of the ranges gap Size of each range bucket produced hardend A boolean, which if true means that the last bucket will end at end even if it is less than gap wide. If false, the last bucket will be gap wide, which may extend past end. other This param indicates that in addition to the counts for each range constraint between facet.range.start and facet.range.end, counts should also be computed for

    "before" all records with field values lower then lower bound of the first range "after" all records with field values greater then the upper bound of the last range "between" all records with field values between the start and end bounds of all ranges "none" compute none of this information "all" shortcut for before, between, and after

    include By default, the ranges used to compute range faceting between facet.range.start and facet.range.end are inclusive of their lower bounds and exclusive of the upper bounds. The before range is exclusive and the after range is inclusive. This default, equivalent to lower below, will not result in double counting at the boundaries. This behavior can be modified by the facet.range.include param, which can be any combination of the following options

    "lower" all gap based ranges include their lower bound "upper" all gap based ranges include their upper bound "edge" the first and last gap ranges include their edge bounds (ie: lower for the first one, upper for the last one)

    even if the corresponding upper/lower option is not specified "outer" the before and after ranges will be inclusive of their bounds, even if the first or last ranges already

    include those boundaries. "all" shorthand for lower, upper, edge, outer

  • Sub-Facets + Facet-Functions =

    Business Intelligence / Analytics

  • Fantasy ($1045) Top Authors $423 George R.R. MarKn $347 Brandon Sanderson $155 JK Rowling Top Books $252 A Game of Thrones $113 Emperor of Thorns $101 Nine Princes in Amber $82 Steel Heart

    Sci-Fi ($898) Top Authors $321 Iain M Banks $218 Neal Asher $155 Neal Stephenson Top Books $113 Gridlinked $101 Use of Weapons $93 Snow Crash $82 The Skinner

    Mystery ($645) Top Authors $191 James Panerson $145 Patricia Cornwell $126 John Grisham Top Books $85 One for the Money $77 Angels & Daemons $64 Shuner Island $35 The Firm

    Filter By State $852 NJ (14 stores) $658 NY (11 stores) $421 CT (8 stores) Chain $984 Amazoon (14 stores) $734 Houses&Royalty (9 stores) $387 Books-r-us (7 stores) Store $108 Amazoon Branchburg $93 Books-r-us Bridgewater $87 H&R NYC Number of Books Chain 201K Houses&Royalty 183K Amazoon 98K Books-r-us Store 193K H&R NYC 77K Books-r-us Bridgewater 68K Amazoon Branchburg

  • date_breakout : { range: { eld: sale_date, start : ..., end : ..., gap : "+1MONTH, facet : { top_genre : { terms : { eld : genre, sort : "revenue desc", limit : 4, facet : { revenue : "sum(sales)" } }}, by_chain: { terms : { eld : chain, facet : { revenue : "sum(sales)" } }} []

    Implementation Creates series of facet buckets based on date

    For each date bucket, facet by genre, taking the top 4 by revenue

    For each genre bucket, report revenue

  • Fantasy ($1045) Top Authors $423 George R.R. MarKn $347 Brandon Sanderson $155 JK Rowling Top Books $252 A Game of Thrones $113 Emperor of Thorns $101 Nine Princes in Amber $82 Steel Heart

    Sci-Fi ($898) Top Authors $321 Iain M Banks $218 Neal Asher $155 Neal Stephenson Top Books $113 Gridlinked $101 Use of Weapons $93 Snow Crash $82 The Skinner

    Mystery ($645) Top Authors $191 James Panerson $145 Patricia Cornwell $126 John Grisham Top Books $85 One for the Money $77 Angels & Daemons $64 Shuner Island $35 The Firm

    top_genres:{ terms:{ eld: genre, facet : { rev : "sum(sales)", top_authors:{ terms:{ eld : author, sort :"rev desc", limit : 3, facet : { rev : "sum(sales)" } }}, top_books:{ terms:{ eld : Ktle, sort : "rev desc", limit : 4, facet : { rev : "sum(sales)" } }} []

  • Filter By State $852 NJ (14 stores) $658 NY (11 stores) $421 CT (8 stores) Chain $984 Amazoon (14 stores) $734 Houses&Royalty (9 stores) $387 Books-r-us (7 stores) Store $108 Amazoon Branchburg $93 Books-r-us Bridgewater $87 H&R NYC

    state_breakout:{ terms:{ eld: state, sort: "rev desc", facet : { rev : "sum(sales)", num_stores : "unique(store)" }}, chain_breakout:{ terms:{ eld: chain, sort: "rev desc", facet : { rev : "sum(sales)", num_stores : "unique(store)" }} , store_breakout:{ terms:{ eld: store, sort: "rev desc", facet : { rev : "sum(sales)", }}}

  • Misc Features

  • Parameter Substitution q Parameters / macros substituted across whole request q Happens before any parsing, so usable in any context

    q=price:[ ${low} TO ${high} ] &low=100 &high=200

    q Default values q=price:[ ${low:0} TO ${high:100} ]

    q Nested q=${price_query} &price_query=${price_field}:[ ${low} TO ${high} ] AND inStock:true &price_field=specialPrice &low=50 &high=100

  • New Query Parser Features

    qFilters in queries - just like fq parameters, but may appear anywhere in a query q=(text:elephant (filter(*:* -price:[ 0 TO 100 ]) OR filter(date[0 TO 2013]) )

    qConstant Score Queries q=color:(blue OR green)^=1 text:shoes

    qComments in Queries (can nest)

    q=+text:elephant /* the main query */ /* boosting part WIP {!func}mul(pop,rank)^10 */

  • Thank You

    Help Develop the Next Generation of Solr! Resources: qhttp://heliosearch.org qhttps://github.com/Heliosearch/heliosearch qhttps://groups.google.com/forum/#!forum/heliosearch qhttps://groups.google.com/forum/#!forum/heliosearch-dev