Payloads in Solr Erik Hatcher Senior Solutions Architect / co-founder, Lucidworks
Payloads in Solr Erik Hatcher
Senior Solutions Architect / co-founder, Lucidworks
Solr now smoothly integrates with Lucene-level payloads. Payloads provide optional per-term metadata, numeric or
otherwise. Payloads help solve challenging use cases such as per-store product pricing and per-term confidence/weighting.
This session will present the payload feature from the Lucene layer up to the Solr integration, including per-store pricing,
per-term weighting, and more.
Payloads in Solr
Payloads in Solr01tl;dr
• Solr 6.6+ via SOLR-1485 • per-term position metadata • Use cases:
• per-store pricing • weighting terms: e.g. confidence of
term, or importance/relevance of term • weighting term types (synonyms
factor lower, verbs factor higher)
Payloads in Solr01Lucene’s Payloads
• Token: PayloadAttribute • byte[] per term position, optional • Several components set payloads
• Similarity.SimScorer #computePayloadFactor
• No built-in components (outside Lucene’s test cases), before SOLR-1485, implemented this
• PostingsEnum#getPayload
Payloads in Solr
http://lucene.apache.org/core/6_6_0/core/org/apache/lucene/codecs/lucene50/
Lucene50PostingsFormat.html
Postings Format
Payloads in Solr01Lucene’s Token
• Field • Attributes:
• CharTerm: term text • … Keyword, Type, Offset,… • and Payload!
Payloads in Solr01setPayload(bytes)
• DelimitedPayloadTokenFilter • NumericPayloadTokenFilter • TokenOffsetPayloadTokenFilter • TypeAsPayloadTokenFilter • pre-analyzed field (Solr)
Payloads in Solr01DelimitedPayloadTokenFilter
Payloads in Solr01DelimitedPayloadTokenFilter
• term1|payload1 term2|payload2 • encodes payloads as:
• float, • int, • or string / raw bytes
field weighted_terms_dps term one doc 0 freq 1 pos 0 payload 1.0 term three doc 0 freq 1 pos 2 payload 3.0 term two doc 0 freq 1 pos 1 payload 2.0 term weighted doc 1 freq 2 pos 0 payload 50.0 pos 1 payload 100.0
Payloads in Solr01Use Cases
• products with per-store pricing • boosting by weighted terms • down-boosting synonyms
Payloads in Solr01Traditional per-store pricing
strategies• Explode docs:
• num_docs=products * stores (1M products * 5000 stores could be up to 5B docs!)
• query-time collapsing (by product id)
• Explode fields: • default_price • store_price_0001 • store_price_0002 • …
store_price_NNNN • query-time field
choice • eg. up to 5000 fields
per document
Payloads in Solr01Payload-based per-store pricing
• default_price • store_prices:
• terms: STORE_0001… STORE_NNNN • per-term payload of price
• One additional field • with up to num_stores terms/payloads
Payloads in Solr01Down-boosting synonyms
id,synonyms_with_payloads 99,tv
synonyms.txt Television, Televisions, TV, TVs
/select?wt=csv&fl=id,score& q={!payload_score f=synonyms_with_payloads v=$payload_term func=max} &payload_term=television id,score 99,0.1
&payload_term=tv id,score 99,1.0
{ "add-field-type": { "name": "synonyms_with_payloads", "stored": "true", "class": "solr.TextField", "positionIncrementGap": “100", "indexAnalyzer": { "tokenizer": { "class": "solr.StandardTokenizerFactory" }, "filters": [ { "class": "solr.SynonymGraphFilterFactory", "expand": "true", "ignoreCase": "true", "synonyms": "synonyms.txt" }, { "class": "solr.LowerCaseFilterFactory" }, { "class": "solr.NumericPayloadTokenFilterFactory", "payload": "0.1", "typeMatch": "SYNONYM"
} ] },
,"queryAnalyzer": { "tokenizer": { "class": "solr.StandardTokenizerFactory" }, "filters": [ { "class": "solr.LowerCaseFilterFactory" }
]
} }}
Payloads in Solr01Solr Integration
• Schema-aware • DelimitedPayloadTokenFilter:
• float, integer, identity • NumericPayloadTokenFilter: float
• Function / Value Source • payload()
• Query parsers • {!payload_score} • {!payload_check}
• Default (data_driven) schema has built-in payload-enabled dynamic field mappings:
• *_dpf, *_dpi, and *_dps
Payloads in Solr01Solr features with payloads
• searching (scoring by payload):q={!payload_score…}
• searching (filtering by payload):fq={!frange cost=999 l=0 u=100}payload(…)
• sorting:sort=payload(…) desc
• faceting:facet.query={!frange l=0 u=100 v=$payload_func}&payload_func=payload(…)
Payloads in Solr01payload()
• payload(field, term [,default_value [,min|max|average|first]])
• Operates on float or integer encoded payloads • Value source, returning a single float per-document • Multiple term matches are possible, returning the min,
max, or average. first is a special short-circuit • If no term match for document, returns default value,
or zero
Payloads in Solr01payload() uses
• &payload_function=payload(….) • Returning:
fl=payload_result:${payload_function} • Sorting:
sort=${payload function} desc • Range faceting:
facet.query={!frange key=up_to_one_hundred l=0 u=100 v=$payload_function}
• Matching: • without payload considered: term query, eg {!term} • with payloads factored: {!payload_check}
Payloads in Solr01{!payload_score}
• SpanQuery wrapping, payload-based scoring • SpanQuery support: currently SpanNearQuery of
SpanTermQuery’s • scoring:
• payload function: min, max, or average • includeSpanScore=true: multiples payload
function result by base query scoring • with a simple term query, payload() function is
equivalent (with includeSpanScore=false)
Payloads in Solr01{!payload_score} examples
{!payload_score f=payloaded_field_name v=term_value func=min|max|average [includeSpanScore=false] }
{!payload_score f=vals_dpf func=average v=weighted includeSpanScore=true}
Payloads in Solr01{!xmlparser}
• {!xmlparser} <BoostingTermQuery fieldName="weighted_terms_dpf"> weighted </BoostingTermQuery>
• == {!payload_score f=weighted_terms_dpf func=average includeSpanScore=true}
Payloads in Solr01{!payload_check}
• SpanQuery wrapping, phrase relevancy scoring • SpanQuery support: currently SpanNearQuery of
SpanTermQuery’s • matching:
• matches when all terms match all corresponding payloads, in order
• scoring: • uses SpanNearQuery’s score
Payloads in Solr01{!payload_check}
id,words_dps 99,taking|VERB the|ARTICLE train|NOUN
q={!payload_check f=words_dps v=train payloads=NOUN}
q={!payload_check f=words_dps v='the train' payloads='ARTICLE NOUN'}
Payloads in Solr01Payload Cons
• payload(): if used as a {!func} q or facet.query it will compute value for ALL documents in index. To PostFilter fq payload function computation of just matching documents use {!frange} with payload()
• Updating values • Atomic field update
• (could multivalue and delete/add a single term|value)? • could mean updating all inventory for all stores for a single
change • no current range faceting support (of functions in general)
Payloads in Solr01What’s next
• SOLR-10541 - “Range facet by function” • solves range faceting by payload
• LUCENE-7854: term frequency “payload” • coming soon, see SOLR-11358
• OpenNLP types => payloads • Pluggable encoders/decoders?
Payloads in Solr
https://lucidworks.com/2017/09/14/solr-payloads/
Further reading
Payloads in Solr