Yonik Seeley Apachecon EU 2014 Budapest, Hungary Native Code, Off-Heap Data & JSON Facet API for Solr
Jul 11, 2015
Yonik Seeley
Apachecon EU 2014
Budapest, Hungary
Native Code, Off-Heap Data &
JSON Facet API for Solr
My Background
• Creator of Solr
• Heliosearch Founder
• LucidWorks Co-Founder
• Lucene/Solr committer, PMC member
• Apache Software Foundation member
• M.S. in Computer Science, Stanford
Heliosearch Project
• The Next Evolution of Solr
• Forked from Solr, Developing at github
– Started Jan 2014
– Well aligned community
– Open Source, Apache licensed
• Bring back to Apache in the future?
• Currently drop-in replacement for Solr at the HTTP-API level
– A super-set… we continually merge in upstream changes
– Latest version of Heliosearch includes latest Solr
• Current Features: Off-heap filters, Off-heap fieldcache, facet-
by-function, sub-facets, native code performance
enhancements
Garbage Collection
Garbage Collection Basics
Eden Space
Survivor Space 1
Survivor Space 2
Tenured Space
Permanent Space
New objects allocated in Eden
Find live objects by tracing from GC
“roots” (threads, stack locals, etc)
Make a copy of live objects, leaving
“garbage” behind
Eden + Survivor Space copied
together to other Survivor space
Tenured from Survivor when old
enough
“stop-the-world” needed when GC
can’t keep up
Out of memory when too much time
spent in GC
Thread
Java Memory Waste- Need to size for worst case scenario
- OS needs free memory to cache index files
- JVMs aren’t good at “sharing” with rest of the system
- mmap allocations managed by OS, can be immediately reused on free
OS Real Memory
Heap in use
Unused Heap
max heap
JVM
Heap in use
Unused Heap
max heap
JVM
C Heap in use
Unused Heap
C Process
C Heap in use
Unused Heap
C Process
mmap alloced mmap alloced
“Free” Memoryincludes buffer cache, important to cache index files
GC Impact
GC Reduces Throughput
Time to copy all that memory around could be spent
better!
Stop-the-world pauses
Seconds to Minutes long
Pause time proportional to heap size
Still exists in all Hotspot GCs… CMS, G1GC, etc
Breaks Application SLAs (request timeouts, etc)
Can cause SolrCloud Zookeeper session timeouts
Reducing max pause size normally means reduced
throughput
Non-graceful degradation
if you don't size your heap big enough… BOOM!
GC TuningUseSerialGC
UseParallelGC
UseParallelOldGC
UseParallelOldGCCompacting
UseParallelDensePrefixUpdate
HeapMaximumCompactionInterval
HeapFirstMaximumCompactionCount
UseMaximumCompactionOnSystemGC
ParallelOldDeadWoodLimiterMean
ParallelOldDeadWoodLimiterStdDev
UseParallelOldGCDensePrefix
ParallelGCThreads
ParallelCMSThreads
YoungPLABSize
OldPLABSize
GCTaskTimeStampEntries
AlwaysTenure
NeverTenure
ScavengeBeforeFullGC
UseConcMarkSweepGC
ExplicitGCInvokesConcurrent
UseCMSBestFit
UseCMSCollectionPassing
UseParNewGC
ParallelGCVerbose
ParallelGCBufferWastePct
ParallelGCRetainPLAB
TargetPLABWastePct
PLABWeight
ResizePLAB
PrintPLAB
ParGCArrayScanChunk
ParGCDesiredObjsFromOverflowList
CMSParPromoteBlocksToClaim
AlwaysPreTouch
CMSUseOldDefaults
CMSYoungGenPerWorker
CMSIncrementalMode
CMSIncrementalDutyCycle
CMSIncrementalPacing
CMSIncrementalDutyCycleMin
CMSIncrementalSafetyFactor
CMSIncrementalOffset
CMSExpAvgFactor
CMS_FLSWeight
CMS_FLSPadding
FLSCoalescePolicy
CMS_SweepWeight
CMS_SweepPadding
CMS_SweepTimerThresholdMillis
CMSClassUnloadingEnabled
CMSCompactWhenClearAllSoftRefs
UseCMSCompactAtFullCollection
CMSFullGCsBeforeCompaction
CMSIndexedFreeListReplenish
CMSLoopWarn
CMSMarkStackSize
CMSMarkStackSizeMax
CMSMaxAbortablePrecleanLoops
CMSMaxAbortablePrecleanTime
CMSAbortablePrecleanMinWorkPerIteration
CMSAbortablePrecleanWaitMillis
CMSRescanMultiple
CMSConcMarkMultiple
CMSRevisitStackSize
CMSAbortSemantics
CMSParallelRemarkEnabled
CMSParallelSurvivorRemarkEnabled
CMSPLABRecordAlways
CMSConcurrentMTEnabled
CMSPermGenPrecleaningEnabled
CMSPermGenSweepingEnabled
CMSPrecleaningEnabled
CMSPrecleanIter
CMSPrecleanNumerator
CMSPrecleanDenominator
CMSPrecleanRefLists1
CMSPrecleanRefLists2
CMSPrecleanSurvivors1
CMSPrecleanSurvivors2
CMSPrecleanThreshold
CMSCleanOnEnter
CMSRemarkVerifyVariant
CMSScheduleRemarkEdenSizeThreshold
CMSScheduleRemarkEdenPenetration
CMSScheduleRemarkSamplingRatio
CMSSamplingGrain
CMSScavengeBeforeRemark
CMSWorkQueueDrainThreshold
CMSWaitDuration
CMSYield
CMSBitMapYieldQuantum
UseGCLogFileRotation
NumberOfGCLogFiles
GCLogFileSize
LargePageSizeInBytes
LargePageHeapSizeThreshold
PrintGCApplicationConcurrentTime
PrintGCApplicationStoppedTime
OnOutOfMemoryError
ClassUnloading
BlockOffsetArrayUseUnallocatedBlock
RefDiscoveryPolicy
ParallelRefProcEnabled
CMSTriggerRatio
CMSBootstrapOccupancy
CMSInitiatingOccupancyFraction
UseCMSInitiatingOccupancyOnly
HandlePromotionFailure
PreserveMarkStackSize
ZeroTLAB
PrintTLAB
TLABStats
AlwaysActAsServerClassMachine
DefaultMaxRAM
DefaultMaxRAMFraction
DefaultInitialRAMFraction
UseAutoGCSelectPolicy
AutoGCSelectPauseMillis
UseAdaptiveSizePolicy
UsePSAdaptiveSurvivorSizePolicy
UseAdaptiveGenerationSizePolicyAtMinorCollection
UseAdaptiveGenerationSizePolicyAtMajorCollection
UseAdaptiveSizePolicyWithSystemGC
UseAdaptiveGCBoundary
AdaptiveSizeThroughPutPolicy
AdaptiveSizePausePolicy
AdaptiveSizePolicyInitializingSteps
AdaptiveSizePolicyOutputInterval
UseAdaptiveSizePolicyFootprintGoal
AdaptiveSizePolicyWeight
AdaptiveTimeWeight
PausePadding
PromotedPadding
SurvivorPadding
AdaptivePermSizeWeight
PermGenPadding
ThresholdTolerance
AdaptiveSizePolicyCollectionCostMargin
YoungGenerationSizeIncrement
YoungGenerationSizeSupplement
YoungGenerationSizeSupplementDecay
TenuredGenerationSizeIncrement
TenuredGenerationSizeSupplement
TenuredGenerationSizeSupplementDecay
MaxGCPauseMillis
MaxGCMinorPauseMillis
GCTimeRatio
AdaptiveSizeDecrementScaleFactor
UseAdaptiveSizeDecayMajorGCCost
AdaptiveSizeMajorGCDecayTimeScale
MinSurvivorRatio
InitialSurvivorRatio
BaseFootPrintEstimate
UseGCOverheadLimit
GCTimeLimit
GCHeapFreeLimit
PrintAdaptiveSizePolicy
DisableExplicitGC
CollectGen0First
BindGCTaskThreadsToCPUs
UseGCTaskAffinity
ProcessDistributionStride
CMSCoordinatorYieldSleepCount
CMSYieldSleepCount
PrintGCTaskTimeStamps
TraceClassLoadingPreorder
TraceGen0Time
TraceGen1Time
PrintTenuringDistribution
PrintHeapAtSIGBREAK
TraceParallelOldGCTasks
PrintParallelOldGCPhaseTimes
MaxHeapSize
MaxNewSize
PretenureSizeThreshold
MinTLABSize
TLABAllocationWeight
TLABWasteTargetPercent
TLABRefillWasteFraction
TLABWasteIncrement
MaxLiveObjectEvacuationRatio
OldSize
MinHeapFreeRatio
MaxHeapFreeRatio
SoftRefLRUPolicyMSPerMB
MinHeapDeltaBytes
MinPermHeapExpansion
MaxPermHeapExpansion
QueuedAllocationWarningCount
MaxTenuringThreshold
InitialTenuringThreshold
TargetSurvivorRatio
MarkSweepDeadRatio
PermMarkSweepDeadRatio
MarkSweepAlwaysCompactCount
PrintCMSStatistics
PrintCMSInitiationStatistics
PrintFLSStatistics
PrintFLSCensus
DeferThrSuspendLoopCount
DeferPollingPageLoopCount
SafepointSpinBeforeYield
UseDepthFirstScavengeOrder
GCDrainStackTargetSize
ThreadSafetyMargin
CodeCacheMinimumFreeSpace
MaxDirectMemorySize
PerfDataMemorySize
AggressiveHeap
UseCompressedStrings
UseStringCache
HeapDumpOnOutOfMemoryError
HeapDumpPath
PrintGC
PrintGCDetails
PrintGCTimeStamps
PG1HeapRegionSize
G1ReservePercent
G1ConfidencePercent
PrintPromotionFailure
PrintGCDateStamps
-XX:InitiatingHeapOccupancyPercent=n
-XX:MaxHeapFreeRatio=70
-XX:MaxGCPauseMillis=n
-XX:+ScavengeBeforeFullGC
-XX:ConcGCThreads=n
-XX:MaxTenuringThreshold=n
GC Reduction
Reuse objects – cause less garbage
Move certain things off-heap (invisible to GC)
Option1: Direct ByteBuffers
Limited to “int” (2GB)
No way to directly “free” – still relies on GC
Option2: sun.misc.Unsafe
malloc() + free() + direct memory access
Supported on all major JVMs
Widely used: Java (nio, concurrent),JSR166, Google
Guava, objenesis (which is used in Kyro, which is used
in Twitter Storm), Apache DirectMemory,Lightning,
Hazelcast, snappy, gson, …
Being considered for Java 9
Off-Heap Filters50M docs(3.8 GB index)
8GB RAM
20K requests
8 req threads
500 filters
JVM Options:
-Xmx4G (solr)
titleOff-Heap
Filters Test
Observed max process sizesSolr : 3.8GB – 4.3GBHeliosearch: 3.6GB – 3.7GB
Off-Heap FieldCache
Normal (on-heap) FieldCache
Typically the largest data structures kept on the heap
Used for sorting, function query values, single-valued faceting,
grouping
Uses weak references
Heliosearch nCache (n is for “native”)
Allocated off-heap
First-class managed Solr cache
Configure size, warming policies
View statistics
Per-segment (NRT friendly)
No weak references
nCache admin stats
item_id:{ "field":"id", "uses":8, "class":"StrTopValues","refcount":2, "numSegments":7, "carriedOver":6, "size":612}
item_popularity:{ "field":"popularity", "uses":5,"class":"IntTopValues", "refcount":2, "numSegments":7,"carriedOver":6, "size":106}
item_price:{ "field":"price”, "uses":0, -- the number of top-level uses for searcher
"class":"FloatTopValues", "refcount":2, "numSegments":5, -- number of segments populated
"carriedOver":5, -- number of segments carried over from last searcher
"size":272 -- size in bytes for all populated segments
}
Off-Heap Integer Field 50M document index
Sorting on 6 different integer fields (10,100,1000,10000,1M unique values)
4 request threads
Results
42% faster sorting
73% faster functions
String Field Sorting 10M document index
10 different string fields, each field 80% populated
Median latency
String Field Sorting Throughput
Concurrent throughput sorting on random fields in random order (asc/desc)
~50% performance gain
Native Code
Native Code
The Idea: create native accelerators for CPU hotspots
Faceting anyone?
But…. JNI Sucks! (and it’s GC’s fault again)
GetArrayElements() – makes a *copy* of the array!
GetPrimitiveArrayCritical() – blocks garbage collection!
Tons of other restrictions… it’s a “critical section”
Defeats the purpose of going to native code in the first place
But… our data is already off-heap, we’re good!
jint *buf= (*env)->GetIntArrayElements(env, arr, 0);for (i=0; i<len; i++) {
sum += buf[i];}
Native Single Valued String Faceting
Top-Level off-heap String cache
Improves Sorting and Faceting speed
Eliminates FieldCache “insanity”
Native Code
Written in C++, compiled with GCC 4.7, 4.8
Currently supports 64 bit Windows, OS-X, Linux (x86)
static compilation avoids JVM hotspot warmup period,
mis-compilation bugs, and variations between runs
Native Faceting Performance
Terms Query Optimization
New Facet Module
Facet Module Goals
Replace the aging “SimpleFacets”
First class JSON support
Easier programmatic construction of complex nested facet commands
Canonical response format that is easier for clients to parse
First class analytics support
Cleaner distributed search support
Fully pluggable
Better base for integration of other search features
Heliosearch is a Solr super-set, so you can still chose to use the old faceting or mix-n-match.
&facet=true
&facet.range={!key=age_ranges}age
&f.age_ranges.facet.range.start=0
&f.age_ranges.facet.range.end=100
&f.age_ranges.facet.range.gap=10
&facet.range={!key=price_ranges}price
&f.price_ranges.facet.range.start=0
&f.price_ranges.facet.range.end=1000
&f.price_ranges.facet.range.gap=50
{age_ranges: { // facet name
range: { // facet typefield : age, // facet paramsstart : 0,end : 100,gap : 10
}},price_ranges: {
range: {field : price,start : 0,end : 1000,gap : 50
} }
}
API ComparisonOld Style New JSON API
Facet Functions
Sort/Report by things other than “count”
Aggregation Functions / Stats:
Stats are calculated “per bucket”
Buckets created by Query, Range, or Terms (field) facets
countsum(function)avg(function)sumsq(function)min(function)max(function)unique(string_field)
any “function query” that yields a numeric value!
Example: sum(mul(num_units, unit_price))
$ curl http://localhost:8983/solr/query -d 'q=widgets&
json.facet=
{ // Comments can help with clarity
/* traditional C-style comments are also supported */
x : "avg(price)" , // Simple strings can occur unquoted
y : 'unique(brand)' // Strings can also use single quotes
}
'
[…]
"facets" : {
"count" : 314,
"x" : 102.5,
"y" : 28
}
Number of documents in the facet bucket
Simple Request + Response
Terms Facet Example
json.facet={
shoes:{
terms:{
field: shoe_style,
sort: {x : desc},
facet:{
x : "avg(price)",
y : "unique(brand)"
}
}
}
}
"facets": {
"count" : 472,
"shoes": {
"buckets" : [
{
"val" : "Hiking",
"count" : 34,
"x" : 135.25,
"y" : 17,
},
{
"val" : "Running",
"count" : 45,
"x" : 110.75,
"y" : 24,
},
Executed per-bucket
Sub-Facets
Any facet that produces buckets can have sub-
facets (terms/field, range, query)
Sub-facets can have facet functions (stats) or their
own sub-facets (no limit to nesting).
A subfacet can be any type (field, range, query)
Multiple subfacets can be added to any given facet
Subfacets are first-class facets - can be configured
independently like any other facet.
Different offsets, limits, stats, sorts, etc
Sub-Facet Example
json.facet={
shoes:{
terms:{
field: shoe_style,
sort: {x : desc},
facet:{
x : "avg(price)",
y : "unique(brand)",
colors :{terms:color}
}
}
}
}
"facets": {"count" : 472,"shoes": {
"buckets" : [{
"val" : "Hiking","count" : 34,"x" : 135.25,"y" : 17,"colors" : {
"buckets" : [{ "val" : "brown",
"count" : 12 },{ "val" : "black",
"count" : 10}, […]
]} // end of colors sub-facet
}, // end of Hiking bucket{
"val" : "Running","count" : 45,"x" : 110.75,"y" : 24,"colors" : {
"buckets" : […]
Short-form for terms facet simply specifies the field. Sorts buckets
by count descending.
Terms Facet
Terms facet creates buckets of docs with the same value in a field
- field – The field name to facet over.
- offset – Used for paging, this skips the first N buckets. Defaults to 0.
- limit – Limits the number of buckets returned. Defaults to 10.
- mincount – Only return buckets with a count of at least this number. Defaults to 1.
- sort – Specifies how to sort the buckets produced. “count” specifies document count, “index” sorts by the index (natural) order of the bucket value. One can also sort by any facet function / statistic that occurs in the bucket. The default is “count desc”. This parameter may also be specified in JSON like sort:{count:desc}. The sort order may either be “asc” or “desc”
- missing – A boolean that specifies if a special “missing” bucket should be returned that is defined by documents without a value in the field. Defaults to false.
- numBuckets – A boolean. If true, adds “numBuckets” to the response, an integer representing the number of buckets for the facet (as opposed to the number of buckets returned). Defaults to false.
- allBuckets – A boolean. If true, adds an “allBuckets” bucket to the response, representing the union of all of the buckets. For multi-valued fields, this is different than a bucket for all of the documents in the domain since a single document can belong to multiple buckets. Defaults to false.
- prefix – Only produce buckets for terms starting with the specified prefix.
Query FacetQuery facet creates a single bucket of documents matching the
query.
{ // simple example
highpop:{ query:{ q:"inStock:true AND popularity[8 TO 10]" } }
}
{ // example with multiple sub-facets
highpop:{ query:{
q : "inStock:true AND popularity[8 TO 10]",
facet : {
average_price : "agv(price)",
available_colors : { terms : color },
price_ranges : { range : {
field:price, start:0, end:200, gap:10
}}
}}
}
Range FacetCreates buckets over ranges on a numeric or date field
Parameter names/values "in sync" with Solr range parameters:
field – The numeric field or date field to produce range buckets from
start – Lower bound of the ranges
end – Upper bound of the ranges
gap – Size of each range bucket produced
hardend – A boolean, which if true means that the last bucket will end at “end” even if it is less than “gap” wide. If false, the last bucket will be “gap” wide, which may extend past “end”.
other – This param indicates that in addition to the counts for each range constraint between facet.range.start and facet.range.end, counts should also be computed for…
– "before" all records with field values lower then lower bound of the first range
– "after" all records with field values greater then the upper bound of the last range
– "between" all records with field values between the start and end bounds of all ranges
– "none" compute none of this information
– "all" shortcut for before, between, and after
include – By default, the ranges used to compute range faceting between facet.range.start and facet.range.end are inclusive of their lower bounds and exclusive of the upper bounds. The “before” range is exclusive and the “after” range is inclusive. This default, equivalent to lower below, will not result in double counting at the boundaries. This behavior can be modified by the facet.range.include param, which can be any combination of the following options…
– "lower" all gap based ranges include their lower bound
– "upper" all gap based ranges include their upper bound
– "edge" the first and last gap ranges include their edge bounds (ie: lower for the first one, upper for the last one) even if the corresponding upper/lower option is not specified
– "outer" the “before” and “after” ranges will be inclusive of their bounds, even if the first or last ranges already include those boundaries.
– "all" shorthand for lower, upper, edge, outer
Sub-Facets + Facet-Functions
=
Business Intelligence / Analytics
Fantasy ($1045)
Top Authors$423 George R.R. Martin$347 Brandon Sanderson$155 JK Rowling
Top Books$252 A Game of Thrones$113 Emperor of Thorns$101 Nine Princes in Amber$82 Steel Heart
Sci-Fi ($898)
Top Authors$321 Iain M Banks$218 Neal Asher$155 Neal Stephenson
Top Books$113 Gridlinked$101 Use of Weapons$93 Snow Crash$82 The Skinner
Mystery ($645)
Top Authors$191 James Patterson$145 Patricia Cornwell$126 John Grisham
Top Books$85 One for the Money$77 Angels & Daemons$64 Shutter Island$35 The Firm
Filter ByState$852 NJ (14 stores)$658 NY (11 stores)$421 CT (8 stores)
Chain$984 Amazoon (14 stores)$734 Houses&Royalty (9 stores)$387 Books-r-us (7 stores)
Store$108 Amazoon Branchburg$93 Books-r-us Bridgewater$87 H&R NYC
Number of Books
Chain201K Houses&Royalty183K Amazoon98K Books-r-us
Store193K H&R NYC77K Books-r-us Bridgewater68K Amazoon Branchburg
date_breakout : { range: {field: sale_date,start : ...,end : ...,gap : "+1MONTH”,
facet : {top_genre : { terms : {
field : genre,sort : "revenue desc",limit : 4,facet : {
revenue : "sum(sales)"}
}},
by_chain: { terms : {field : chain,facet : {
revenue : "sum(sales)"}
}}[…]
ImplementationCreates series of facet buckets based on date
For each date bucket, facet by genre, taking the top 4 by revenue
For each genre bucket, report revenue
Fantasy ($1045)
Top Authors$423 George R.R. Martin$347 Brandon Sanderson$155 JK Rowling
Top Books$252 A Game of Thrones$113 Emperor of Thorns$101 Nine Princes in Amber$82 Steel Heart
Sci-Fi ($898)
Top Authors$321 Iain M Banks$218 Neal Asher$155 Neal Stephenson
Top Books$113 Gridlinked$101 Use of Weapons$93 Snow Crash$82 The Skinner
Mystery ($645)
Top Authors$191 James Patterson$145 Patricia Cornwell$126 John Grisham
Top Books$85 One for the Money$77 Angels & Daemons$64 Shutter Island$35 The Firm
top_genres:{ terms:{field: genre,facet : {
rev : "sum(sales)",
top_authors:{ terms:{field : author,sort :"rev desc",limit : 3,facet : {
rev : "sum(sales)"}
}},
top_books:{ terms:{field : title,sort : "rev desc",limit : 4,facet : {
rev : "sum(sales)"}
}}[…]
Filter ByState$852 NJ (14 stores)$658 NY (11 stores)$421 CT (8 stores)
Chain$984 Amazoon (14 stores)$734 Houses&Royalty (9 stores)$387 Books-r-us (7 stores)
Store$108 Amazoon Branchburg$93 Books-r-us Bridgewater$87 H&R NYC
state_breakout:{ terms:{field: state,sort: "rev desc",facet : {
rev : "sum(sales)",num_stores : "unique(store)"
}},
chain_breakout:{ terms:{field: chain,sort: "rev desc",facet : {
rev : "sum(sales)",num_stores : "unique(store)"
}} ,
store_breakout:{ terms:{field: store,sort: "rev desc",facet : {
rev : "sum(sales)",}}}
Misc Features
Parameter Substitution
Parameters / macros substituted across whole request
Happens before any parsing, so usable in any context
q=price:[ ${low} TO ${high} ]
&low=100
&high=200
Default values
q=price:[ ${low:0} TO ${high:100} ]
Nested
q=${price_query}
&price_query=${price_field}:[ ${low} TO ${high} ] AND inStock:true
&price_field=specialPrice
&low=50
&high=100
New Query Parser Features
Filters in queries - just like “fq” parameters, but may appear
anywhere in a query
q=(text:elephant –(filter(*:* -price:[ 0 TO 100 ]) OR
filter(date[0 TO 2013]) )
Constant Score Queries
q=color:(blue OR green)^=1 text:shoes
Comments in Queries (can nest)
q=+text:elephant /* the main query */ /* boosting part – WIP
{!func}mul(pop,rank)^10 */
Thank You
Help Develop the Next Generation of Solr!
Resources:
http://heliosearch.org
https://github.com/Heliosearch/heliosearch
https://groups.google.com/forum/#!forum/heliosearch
https://groups.google.com/forum/#!forum/heliosearch-dev
twitter.com/lucene_solr