Top Banner
Yonik Seeley Apachecon EU 2014 Budapest, Hungary Native Code, Off-Heap Data & JSON Facet API for Solr
43

Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

Jul 11, 2015

Download

Technology

Yonik Seeley
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

Yonik Seeley

Apachecon EU 2014

Budapest, Hungary

Native Code, Off-Heap Data &

JSON Facet API for Solr

Page 2: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

My Background

• Creator of Solr

• Heliosearch Founder

• LucidWorks Co-Founder

• Lucene/Solr committer, PMC member

• Apache Software Foundation member

• M.S. in Computer Science, Stanford

Page 3: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

Heliosearch Project

• The Next Evolution of Solr

• Forked from Solr, Developing at github

– Started Jan 2014

– Well aligned community

– Open Source, Apache licensed

• Bring back to Apache in the future?

• Currently drop-in replacement for Solr at the HTTP-API level

– A super-set… we continually merge in upstream changes

– Latest version of Heliosearch includes latest Solr

• Current Features: Off-heap filters, Off-heap fieldcache, facet-

by-function, sub-facets, native code performance

enhancements

Page 4: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

Garbage Collection

Page 5: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

Garbage Collection Basics

Eden Space

Survivor Space 1

Survivor Space 2

Tenured Space

Permanent Space

New objects allocated in Eden

Find live objects by tracing from GC

“roots” (threads, stack locals, etc)

Make a copy of live objects, leaving

“garbage” behind

Eden + Survivor Space copied

together to other Survivor space

Tenured from Survivor when old

enough

“stop-the-world” needed when GC

can’t keep up

Out of memory when too much time

spent in GC

Thread

Page 6: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

Java Memory Waste- Need to size for worst case scenario

- OS needs free memory to cache index files

- JVMs aren’t good at “sharing” with rest of the system

- mmap allocations managed by OS, can be immediately reused on free

OS Real Memory

Heap in use

Unused Heap

max heap

JVM

Heap in use

Unused Heap

max heap

JVM

C Heap in use

Unused Heap

C Process

C Heap in use

Unused Heap

C Process

mmap alloced mmap alloced

“Free” Memoryincludes buffer cache, important to cache index files

Page 7: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

GC Impact

GC Reduces Throughput

Time to copy all that memory around could be spent

better!

Stop-the-world pauses

Seconds to Minutes long

Pause time proportional to heap size

Still exists in all Hotspot GCs… CMS, G1GC, etc

Breaks Application SLAs (request timeouts, etc)

Can cause SolrCloud Zookeeper session timeouts

Reducing max pause size normally means reduced

throughput

Non-graceful degradation

if you don't size your heap big enough… BOOM!

Page 8: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

GC TuningUseSerialGC

UseParallelGC

UseParallelOldGC

UseParallelOldGCCompacting

UseParallelDensePrefixUpdate

HeapMaximumCompactionInterval

HeapFirstMaximumCompactionCount

UseMaximumCompactionOnSystemGC

ParallelOldDeadWoodLimiterMean

ParallelOldDeadWoodLimiterStdDev

UseParallelOldGCDensePrefix

ParallelGCThreads

ParallelCMSThreads

YoungPLABSize

OldPLABSize

GCTaskTimeStampEntries

AlwaysTenure

NeverTenure

ScavengeBeforeFullGC

UseConcMarkSweepGC

ExplicitGCInvokesConcurrent

UseCMSBestFit

UseCMSCollectionPassing

UseParNewGC

ParallelGCVerbose

ParallelGCBufferWastePct

ParallelGCRetainPLAB

TargetPLABWastePct

PLABWeight

ResizePLAB

PrintPLAB

ParGCArrayScanChunk

ParGCDesiredObjsFromOverflowList

CMSParPromoteBlocksToClaim

AlwaysPreTouch

CMSUseOldDefaults

CMSYoungGenPerWorker

CMSIncrementalMode

CMSIncrementalDutyCycle

CMSIncrementalPacing

CMSIncrementalDutyCycleMin

CMSIncrementalSafetyFactor

CMSIncrementalOffset

CMSExpAvgFactor

CMS_FLSWeight

CMS_FLSPadding

FLSCoalescePolicy

CMS_SweepWeight

CMS_SweepPadding

CMS_SweepTimerThresholdMillis

CMSClassUnloadingEnabled

CMSCompactWhenClearAllSoftRefs

UseCMSCompactAtFullCollection

CMSFullGCsBeforeCompaction

CMSIndexedFreeListReplenish

CMSLoopWarn

CMSMarkStackSize

CMSMarkStackSizeMax

CMSMaxAbortablePrecleanLoops

CMSMaxAbortablePrecleanTime

CMSAbortablePrecleanMinWorkPerIteration

CMSAbortablePrecleanWaitMillis

CMSRescanMultiple

CMSConcMarkMultiple

CMSRevisitStackSize

CMSAbortSemantics

CMSParallelRemarkEnabled

CMSParallelSurvivorRemarkEnabled

CMSPLABRecordAlways

CMSConcurrentMTEnabled

CMSPermGenPrecleaningEnabled

CMSPermGenSweepingEnabled

CMSPrecleaningEnabled

CMSPrecleanIter

CMSPrecleanNumerator

CMSPrecleanDenominator

CMSPrecleanRefLists1

CMSPrecleanRefLists2

CMSPrecleanSurvivors1

CMSPrecleanSurvivors2

CMSPrecleanThreshold

CMSCleanOnEnter

CMSRemarkVerifyVariant

CMSScheduleRemarkEdenSizeThreshold

CMSScheduleRemarkEdenPenetration

CMSScheduleRemarkSamplingRatio

CMSSamplingGrain

CMSScavengeBeforeRemark

CMSWorkQueueDrainThreshold

CMSWaitDuration

CMSYield

CMSBitMapYieldQuantum

UseGCLogFileRotation

NumberOfGCLogFiles

GCLogFileSize

LargePageSizeInBytes

LargePageHeapSizeThreshold

PrintGCApplicationConcurrentTime

PrintGCApplicationStoppedTime

OnOutOfMemoryError

ClassUnloading

BlockOffsetArrayUseUnallocatedBlock

RefDiscoveryPolicy

ParallelRefProcEnabled

CMSTriggerRatio

CMSBootstrapOccupancy

CMSInitiatingOccupancyFraction

UseCMSInitiatingOccupancyOnly

HandlePromotionFailure

PreserveMarkStackSize

ZeroTLAB

PrintTLAB

TLABStats

AlwaysActAsServerClassMachine

DefaultMaxRAM

DefaultMaxRAMFraction

DefaultInitialRAMFraction

UseAutoGCSelectPolicy

AutoGCSelectPauseMillis

UseAdaptiveSizePolicy

UsePSAdaptiveSurvivorSizePolicy

UseAdaptiveGenerationSizePolicyAtMinorCollection

UseAdaptiveGenerationSizePolicyAtMajorCollection

UseAdaptiveSizePolicyWithSystemGC

UseAdaptiveGCBoundary

AdaptiveSizeThroughPutPolicy

AdaptiveSizePausePolicy

AdaptiveSizePolicyInitializingSteps

AdaptiveSizePolicyOutputInterval

UseAdaptiveSizePolicyFootprintGoal

AdaptiveSizePolicyWeight

AdaptiveTimeWeight

PausePadding

PromotedPadding

SurvivorPadding

AdaptivePermSizeWeight

PermGenPadding

ThresholdTolerance

AdaptiveSizePolicyCollectionCostMargin

YoungGenerationSizeIncrement

YoungGenerationSizeSupplement

YoungGenerationSizeSupplementDecay

TenuredGenerationSizeIncrement

TenuredGenerationSizeSupplement

TenuredGenerationSizeSupplementDecay

MaxGCPauseMillis

MaxGCMinorPauseMillis

GCTimeRatio

AdaptiveSizeDecrementScaleFactor

UseAdaptiveSizeDecayMajorGCCost

AdaptiveSizeMajorGCDecayTimeScale

MinSurvivorRatio

InitialSurvivorRatio

BaseFootPrintEstimate

UseGCOverheadLimit

GCTimeLimit

GCHeapFreeLimit

PrintAdaptiveSizePolicy

DisableExplicitGC

CollectGen0First

BindGCTaskThreadsToCPUs

UseGCTaskAffinity

ProcessDistributionStride

CMSCoordinatorYieldSleepCount

CMSYieldSleepCount

PrintGCTaskTimeStamps

TraceClassLoadingPreorder

TraceGen0Time

TraceGen1Time

PrintTenuringDistribution

PrintHeapAtSIGBREAK

TraceParallelOldGCTasks

PrintParallelOldGCPhaseTimes

MaxHeapSize

MaxNewSize

PretenureSizeThreshold

MinTLABSize

TLABAllocationWeight

TLABWasteTargetPercent

TLABRefillWasteFraction

TLABWasteIncrement

MaxLiveObjectEvacuationRatio

OldSize

MinHeapFreeRatio

MaxHeapFreeRatio

SoftRefLRUPolicyMSPerMB

MinHeapDeltaBytes

MinPermHeapExpansion

MaxPermHeapExpansion

QueuedAllocationWarningCount

MaxTenuringThreshold

InitialTenuringThreshold

TargetSurvivorRatio

MarkSweepDeadRatio

PermMarkSweepDeadRatio

MarkSweepAlwaysCompactCount

PrintCMSStatistics

PrintCMSInitiationStatistics

PrintFLSStatistics

PrintFLSCensus

DeferThrSuspendLoopCount

DeferPollingPageLoopCount

SafepointSpinBeforeYield

UseDepthFirstScavengeOrder

GCDrainStackTargetSize

ThreadSafetyMargin

CodeCacheMinimumFreeSpace

MaxDirectMemorySize

PerfDataMemorySize

AggressiveHeap

UseCompressedStrings

UseStringCache

HeapDumpOnOutOfMemoryError

HeapDumpPath

PrintGC

PrintGCDetails

PrintGCTimeStamps

PG1HeapRegionSize

G1ReservePercent

G1ConfidencePercent

PrintPromotionFailure

PrintGCDateStamps

-XX:InitiatingHeapOccupancyPercent=n

-XX:MaxHeapFreeRatio=70

-XX:MaxGCPauseMillis=n

-XX:+ScavengeBeforeFullGC

-XX:ConcGCThreads=n

-XX:MaxTenuringThreshold=n

Page 9: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

GC Reduction

Reuse objects – cause less garbage

Move certain things off-heap (invisible to GC)

Option1: Direct ByteBuffers

Limited to “int” (2GB)

No way to directly “free” – still relies on GC

Option2: sun.misc.Unsafe

malloc() + free() + direct memory access

Supported on all major JVMs

Widely used: Java (nio, concurrent),JSR166, Google

Guava, objenesis (which is used in Kyro, which is used

in Twitter Storm), Apache DirectMemory,Lightning,

Hazelcast, snappy, gson, …

Being considered for Java 9

Page 10: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

Off-Heap Filters50M docs(3.8 GB index)

8GB RAM

20K requests

8 req threads

500 filters

JVM Options:

-Xmx4G (solr)

Page 11: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

titleOff-Heap

Filters Test

Observed max process sizesSolr : 3.8GB – 4.3GBHeliosearch: 3.6GB – 3.7GB

Page 12: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

Off-Heap FieldCache

Normal (on-heap) FieldCache

Typically the largest data structures kept on the heap

Used for sorting, function query values, single-valued faceting,

grouping

Uses weak references

Heliosearch nCache (n is for “native”)

Allocated off-heap

First-class managed Solr cache

Configure size, warming policies

View statistics

Per-segment (NRT friendly)

No weak references

Page 13: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Page 14: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

nCache admin stats

item_id:{ "field":"id", "uses":8, "class":"StrTopValues","refcount":2, "numSegments":7, "carriedOver":6, "size":612}

item_popularity:{ "field":"popularity", "uses":5,"class":"IntTopValues", "refcount":2, "numSegments":7,"carriedOver":6, "size":106}

item_price:{ "field":"price”, "uses":0, -- the number of top-level uses for searcher

"class":"FloatTopValues", "refcount":2, "numSegments":5, -- number of segments populated

"carriedOver":5, -- number of segments carried over from last searcher

"size":272 -- size in bytes for all populated segments

}

Page 15: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

Off-Heap Integer Field 50M document index

Sorting on 6 different integer fields (10,100,1000,10000,1M unique values)

4 request threads

Results

42% faster sorting

73% faster functions

Page 16: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

String Field Sorting 10M document index

10 different string fields, each field 80% populated

Median latency

Page 17: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

String Field Sorting Throughput

Concurrent throughput sorting on random fields in random order (asc/desc)

~50% performance gain

Page 18: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

Native Code

Page 19: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

Native Code

The Idea: create native accelerators for CPU hotspots

Faceting anyone?

But…. JNI Sucks! (and it’s GC’s fault again)

GetArrayElements() – makes a *copy* of the array!

GetPrimitiveArrayCritical() – blocks garbage collection!

Tons of other restrictions… it’s a “critical section”

Defeats the purpose of going to native code in the first place

But… our data is already off-heap, we’re good!

jint *buf= (*env)->GetIntArrayElements(env, arr, 0);for (i=0; i<len; i++) {

sum += buf[i];}

Page 20: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

Native Single Valued String Faceting

Top-Level off-heap String cache

Improves Sorting and Faceting speed

Eliminates FieldCache “insanity”

Native Code

Written in C++, compiled with GCC 4.7, 4.8

Currently supports 64 bit Windows, OS-X, Linux (x86)

static compilation avoids JVM hotspot warmup period,

mis-compilation bugs, and variations between runs

Page 21: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

Native Faceting Performance

Page 22: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

Terms Query Optimization

Page 23: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Page 24: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

New Facet Module

Page 25: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

Facet Module Goals

Replace the aging “SimpleFacets”

First class JSON support

Easier programmatic construction of complex nested facet commands

Canonical response format that is easier for clients to parse

First class analytics support

Cleaner distributed search support

Fully pluggable

Better base for integration of other search features

Heliosearch is a Solr super-set, so you can still chose to use the old faceting or mix-n-match.

Page 26: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

&facet=true

&facet.range={!key=age_ranges}age

&f.age_ranges.facet.range.start=0

&f.age_ranges.facet.range.end=100

&f.age_ranges.facet.range.gap=10

&facet.range={!key=price_ranges}price

&f.price_ranges.facet.range.start=0

&f.price_ranges.facet.range.end=1000

&f.price_ranges.facet.range.gap=50

{age_ranges: { // facet name

range: { // facet typefield : age, // facet paramsstart : 0,end : 100,gap : 10

}},price_ranges: {

range: {field : price,start : 0,end : 1000,gap : 50

} }

}

API ComparisonOld Style New JSON API

Page 27: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

Facet Functions

Sort/Report by things other than “count”

Aggregation Functions / Stats:

Stats are calculated “per bucket”

Buckets created by Query, Range, or Terms (field) facets

countsum(function)avg(function)sumsq(function)min(function)max(function)unique(string_field)

any “function query” that yields a numeric value!

Example: sum(mul(num_units, unit_price))

Page 28: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

$ curl http://localhost:8983/solr/query -d 'q=widgets&

json.facet=

{ // Comments can help with clarity

/* traditional C-style comments are also supported */

x : "avg(price)" , // Simple strings can occur unquoted

y : 'unique(brand)' // Strings can also use single quotes

}

'

[…]

"facets" : {

"count" : 314,

"x" : 102.5,

"y" : 28

}

Number of documents in the facet bucket

Simple Request + Response

Page 29: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

Terms Facet Example

json.facet={

shoes:{

terms:{

field: shoe_style,

sort: {x : desc},

facet:{

x : "avg(price)",

y : "unique(brand)"

}

}

}

}

"facets": {

"count" : 472,

"shoes": {

"buckets" : [

{

"val" : "Hiking",

"count" : 34,

"x" : 135.25,

"y" : 17,

},

{

"val" : "Running",

"count" : 45,

"x" : 110.75,

"y" : 24,

},

Executed per-bucket

Page 30: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

Sub-Facets

Any facet that produces buckets can have sub-

facets (terms/field, range, query)

Sub-facets can have facet functions (stats) or their

own sub-facets (no limit to nesting).

A subfacet can be any type (field, range, query)

Multiple subfacets can be added to any given facet

Subfacets are first-class facets - can be configured

independently like any other facet.

Different offsets, limits, stats, sorts, etc

Page 31: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

Sub-Facet Example

json.facet={

shoes:{

terms:{

field: shoe_style,

sort: {x : desc},

facet:{

x : "avg(price)",

y : "unique(brand)",

colors :{terms:color}

}

}

}

}

"facets": {"count" : 472,"shoes": {

"buckets" : [{

"val" : "Hiking","count" : 34,"x" : 135.25,"y" : 17,"colors" : {

"buckets" : [{ "val" : "brown",

"count" : 12 },{ "val" : "black",

"count" : 10}, […]

]} // end of colors sub-facet

}, // end of Hiking bucket{

"val" : "Running","count" : 45,"x" : 110.75,"y" : 24,"colors" : {

"buckets" : […]

Short-form for terms facet simply specifies the field. Sorts buckets

by count descending.

Page 32: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

Terms Facet

Terms facet creates buckets of docs with the same value in a field

- field – The field name to facet over.

- offset – Used for paging, this skips the first N buckets. Defaults to 0.

- limit – Limits the number of buckets returned. Defaults to 10.

- mincount – Only return buckets with a count of at least this number. Defaults to 1.

- sort – Specifies how to sort the buckets produced. “count” specifies document count, “index” sorts by the index (natural) order of the bucket value. One can also sort by any facet function / statistic that occurs in the bucket. The default is “count desc”. This parameter may also be specified in JSON like sort:{count:desc}. The sort order may either be “asc” or “desc”

- missing – A boolean that specifies if a special “missing” bucket should be returned that is defined by documents without a value in the field. Defaults to false.

- numBuckets – A boolean. If true, adds “numBuckets” to the response, an integer representing the number of buckets for the facet (as opposed to the number of buckets returned). Defaults to false.

- allBuckets – A boolean. If true, adds an “allBuckets” bucket to the response, representing the union of all of the buckets. For multi-valued fields, this is different than a bucket for all of the documents in the domain since a single document can belong to multiple buckets. Defaults to false.

- prefix – Only produce buckets for terms starting with the specified prefix.

Page 33: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

Query FacetQuery facet creates a single bucket of documents matching the

query.

{ // simple example

highpop:{ query:{ q:"inStock:true AND popularity[8 TO 10]" } }

}

{ // example with multiple sub-facets

highpop:{ query:{

q : "inStock:true AND popularity[8 TO 10]",

facet : {

average_price : "agv(price)",

available_colors : { terms : color },

price_ranges : { range : {

field:price, start:0, end:200, gap:10

}}

}}

}

Page 34: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

Range FacetCreates buckets over ranges on a numeric or date field

Parameter names/values "in sync" with Solr range parameters:

field – The numeric field or date field to produce range buckets from

start – Lower bound of the ranges

end – Upper bound of the ranges

gap – Size of each range bucket produced

hardend – A boolean, which if true means that the last bucket will end at “end” even if it is less than “gap” wide. If false, the last bucket will be “gap” wide, which may extend past “end”.

other – This param indicates that in addition to the counts for each range constraint between facet.range.start and facet.range.end, counts should also be computed for…

– "before" all records with field values lower then lower bound of the first range

– "after" all records with field values greater then the upper bound of the last range

– "between" all records with field values between the start and end bounds of all ranges

– "none" compute none of this information

– "all" shortcut for before, between, and after

include – By default, the ranges used to compute range faceting between facet.range.start and facet.range.end are inclusive of their lower bounds and exclusive of the upper bounds. The “before” range is exclusive and the “after” range is inclusive. This default, equivalent to lower below, will not result in double counting at the boundaries. This behavior can be modified by the facet.range.include param, which can be any combination of the following options…

– "lower" all gap based ranges include their lower bound

– "upper" all gap based ranges include their upper bound

– "edge" the first and last gap ranges include their edge bounds (ie: lower for the first one, upper for the last one) even if the corresponding upper/lower option is not specified

– "outer" the “before” and “after” ranges will be inclusive of their bounds, even if the first or last ranges already include those boundaries.

– "all" shorthand for lower, upper, edge, outer

Page 35: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

Sub-Facets + Facet-Functions

=

Business Intelligence / Analytics

Page 36: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

Fantasy ($1045)

Top Authors$423 George R.R. Martin$347 Brandon Sanderson$155 JK Rowling

Top Books$252 A Game of Thrones$113 Emperor of Thorns$101 Nine Princes in Amber$82 Steel Heart

Sci-Fi ($898)

Top Authors$321 Iain M Banks$218 Neal Asher$155 Neal Stephenson

Top Books$113 Gridlinked$101 Use of Weapons$93 Snow Crash$82 The Skinner

Mystery ($645)

Top Authors$191 James Patterson$145 Patricia Cornwell$126 John Grisham

Top Books$85 One for the Money$77 Angels & Daemons$64 Shutter Island$35 The Firm

Filter ByState$852 NJ (14 stores)$658 NY (11 stores)$421 CT (8 stores)

Chain$984 Amazoon (14 stores)$734 Houses&Royalty (9 stores)$387 Books-r-us (7 stores)

Store$108 Amazoon Branchburg$93 Books-r-us Bridgewater$87 H&R NYC

Number of Books

Chain201K Houses&Royalty183K Amazoon98K Books-r-us

Store193K H&R NYC77K Books-r-us Bridgewater68K Amazoon Branchburg

Page 37: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

date_breakout : { range: {field: sale_date,start : ...,end : ...,gap : "+1MONTH”,

facet : {top_genre : { terms : {

field : genre,sort : "revenue desc",limit : 4,facet : {

revenue : "sum(sales)"}

}},

by_chain: { terms : {field : chain,facet : {

revenue : "sum(sales)"}

}}[…]

ImplementationCreates series of facet buckets based on date

For each date bucket, facet by genre, taking the top 4 by revenue

For each genre bucket, report revenue

Page 38: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

Fantasy ($1045)

Top Authors$423 George R.R. Martin$347 Brandon Sanderson$155 JK Rowling

Top Books$252 A Game of Thrones$113 Emperor of Thorns$101 Nine Princes in Amber$82 Steel Heart

Sci-Fi ($898)

Top Authors$321 Iain M Banks$218 Neal Asher$155 Neal Stephenson

Top Books$113 Gridlinked$101 Use of Weapons$93 Snow Crash$82 The Skinner

Mystery ($645)

Top Authors$191 James Patterson$145 Patricia Cornwell$126 John Grisham

Top Books$85 One for the Money$77 Angels & Daemons$64 Shutter Island$35 The Firm

top_genres:{ terms:{field: genre,facet : {

rev : "sum(sales)",

top_authors:{ terms:{field : author,sort :"rev desc",limit : 3,facet : {

rev : "sum(sales)"}

}},

top_books:{ terms:{field : title,sort : "rev desc",limit : 4,facet : {

rev : "sum(sales)"}

}}[…]

Page 39: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

Filter ByState$852 NJ (14 stores)$658 NY (11 stores)$421 CT (8 stores)

Chain$984 Amazoon (14 stores)$734 Houses&Royalty (9 stores)$387 Books-r-us (7 stores)

Store$108 Amazoon Branchburg$93 Books-r-us Bridgewater$87 H&R NYC

state_breakout:{ terms:{field: state,sort: "rev desc",facet : {

rev : "sum(sales)",num_stores : "unique(store)"

}},

chain_breakout:{ terms:{field: chain,sort: "rev desc",facet : {

rev : "sum(sales)",num_stores : "unique(store)"

}} ,

store_breakout:{ terms:{field: store,sort: "rev desc",facet : {

rev : "sum(sales)",}}}

Page 40: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

Misc Features

Page 41: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

Parameter Substitution

Parameters / macros substituted across whole request

Happens before any parsing, so usable in any context

q=price:[ ${low} TO ${high} ]

&low=100

&high=200

Default values

q=price:[ ${low:0} TO ${high:100} ]

Nested

q=${price_query}

&price_query=${price_field}:[ ${low} TO ${high} ] AND inStock:true

&price_field=specialPrice

&low=50

&high=100

Page 42: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

New Query Parser Features

Filters in queries - just like “fq” parameters, but may appear

anywhere in a query

q=(text:elephant –(filter(*:* -price:[ 0 TO 100 ]) OR

filter(date[0 TO 2013]) )

Constant Score Queries

q=color:(blue OR green)^=1 text:shoes

Comments in Queries (can nest)

q=+text:elephant /* the main query */ /* boosting part – WIP

{!func}mul(pop,rank)^10 */

Page 43: Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)

Thank You

Help Develop the Next Generation of Solr!

Resources:

http://heliosearch.org

https://github.com/Heliosearch/heliosearch

https://groups.google.com/forum/#!forum/heliosearch

https://groups.google.com/forum/#!forum/heliosearch-dev

twitter.com/lucene_solr