Top Banner
Full-Text Search Explained Philipp Krenn @xeraa
148

Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Apr 05, 2017

Download

Technology

Codemotion
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Full-Text SearchExplained

Philipp Krenn @xeraa

Page 2: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Page 3: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Page 4: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Page 5: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Infrastructure | Developer Advocate

Page 6: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Who usesDatabases?

Page 7: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Who usesSearch?

Page 8: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Databasevs

Full-Text Search

Page 9: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Page 10: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

But I can do...SELECT * FROM my_table WHERE my_text LIKE ‘%my_term%’

Page 11: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Page 12: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

1. PerformanceB-Tree

Page 13: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Page 14: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Page 15: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Page 16: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

2. FeaturesFuzziness, synonyms, scoring,...

Page 17: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Store

Page 18: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

IndexingRemove formating

Page 19: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

IndexingTokenize

Page 20: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

IndexingStop words

Page 21: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

IndexingStemming

http://snowballstem.org

Page 22: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

IndexingSynonyms

Page 23: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Page 24: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Apache LuceneElasticsearch

Page 25: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Page 26: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

https://cloud.elastic.co

Page 27: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Page 28: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Page 29: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

---version: '2'services: kibana: image: docker.elastic.co/kibana/kibana:5.2.2 links: - elasticsearch ports: - 5601:5601

elasticsearch: image: docker.elastic.co/elasticsearch/elasticsearch:5.2.2 cap_add: - IPC_LOCK volumes: - esdata1:/usr/share/elasticsearch/data ports: - 9200:9200

volumes: esdata1: driver: local

Page 30: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Page 31: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

ExampleThese are <em>not</em> the droids you

are looking for.

Page 32: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

html_strip Char FilterThese are not the droids you are looking

for.

Page 33: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

standard TokenizerThese are not the droids you are

looking for

Page 34: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

lowercase Token Filterthese are not the droids you are

looking for

Page 35: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

stop Token Filterdroids you looking

Page 36: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

snowball Token Filterdroid you look

Page 37: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

GET /_analyze{ "analyzer": "english", "text": "These are not the droids you are looking for."}

Page 38: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

{ "tokens": [ { "token": "droid", "start_offset": 18, "end_offset": 24, "type": "<ALPHANUM>", "position": 4 }, { "token": "you", "start_offset": 25, "end_offset": 28, "type": "<ALPHANUM>", "position": 5 }, ... ]}

Page 39: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

GET /_analyze{ "char_filter": [ "html_strip" ], "tokenizer": "standard", "filter": [ "lowercase", "stop", "snowball" ], "text": "These are <em>not</em> the droids you are looking for."}

Page 40: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

{ "tokens": [ { "token": "droid", "start_offset": 27, "end_offset": 33, "type": "<ALPHANUM>", "position": 4 }, { "token": "you", "start_offset": 34, "end_offset": 37, "type": "<ALPHANUM>", "position": 5 }, ... ]}

Page 41: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Stop Wordsa an and are as at be but by for if in into is it no not of on or such that the their then

there these they this to was will withhttps://github.com/apache/lucene-solr/blob/master/lucene/

core/src/java/org/apache/lucene/analysis/standard/StandardAnalyzer.java#L44-L50

Page 42: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

ItalianQuesti non sono i droidi che state

cercando.

Page 43: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Italiandroid state cercand

Page 44: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

GET /_analyze{ "analyzer": "italian", "text": "Questi non sono i droidi che state cercando."}

Page 45: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

{ "tokens": [ { "token": "droid", "start_offset": 18, "end_offset": 24, "type": "<ALPHANUM>", "position": 4 }, { "token": "state", "start_offset": 29, "end_offset": 34, "type": "<ALPHANUM>", "position": 6 }, ... ]}

Page 47: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Italian with the English Analyzer

questi non sono i droidi chestate cercand

Page 48: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Detecting Languageshttps://github.com/spinscale/elasticsearch-ingest-langdetect

Page 49: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Languages in 5.0arabic, armenian, basque, brazilian, bulgarian, catalan, cjk,

czech, danish, dutch, english, finnish, french, galician, german, greek, hindi, hungarian, indonesian, irish, italian, latvian,

lithuanian, norwegian, persian, portuguese, romanian, russian, sorani, spanish, swedish, turkish, thai

Page 50: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Language RulesEnglish: Philipp's → philipp

French: l'église → eglisGerman: äußerst → ausserst

Page 51: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Another ExampleObi-Wan never told you what happened to

your father.

Page 52: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Another Exampleobi wan never told you what

happen your father

Page 53: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Another Example<b>No</b>. I am your father.

Page 54: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Another Examplei am your father

Page 55: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

ElasticsearchIndex, typ, mapping

Page 56: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Multi-Language SupportOne language per indexOne language per typeOne language per field

Page 57: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

PUT /starwars{ "settings": { "analysis": { "filter": { "my_synonym_filter": { "type": "synonym", "synonyms": [ "droid,machine", "father,dad" ] } },

Page 58: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

"analyzer": { "my_analyzer": { "char_filter": [ "html_strip" ], "tokenizer": "standard", "filter": [ "lowercase", "stop", "snowball", "my_synonym_filter" ] } } } },

Page 59: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

"mappings": { "quotes": { "properties": { "quote": { "type": "text", "analyzer": "my_analyzer" } } } }}

Page 60: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

GET /starwars/_mapping

GET /starwars/_settings

Page 61: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

PUT /starwars/quotes/1{ "quote": "These are <em>not</em> the droids you are looking for."}PUT /starwars/quotes/2{ "quote": "Obi-Wan never told you what happened to your father."}PUT /starwars/quotes/3{ "quote": "<b>No</b>. I am your father."}

Page 62: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

GET /starwars/quotes/1

GET /starwars/quotes/1/_source

Page 63: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Inverted Index ID 1 ID 2 ID 3am 0 0 1[2]droid 1[4] 0 0father 0 1[9] 1[4]happen 0 1[6] 0i 0 0 1[1]look 1[7] 0 0never 0 1[2] 0obi 0 1[0] 0told 0 1[3] 0wan 0 1[1] 0what 0 1[5] 0you 1[5] 1[4] 0your 0 1[8] 1[3]

Page 64: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Search

Page 65: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

POST /starwars/_search{ "query": { "match_all": { } }}

Page 66: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

GET vs POST

Page 67: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

{ "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": 1, "hits": [ { "_index": "starwars", "_type": "my_type", "_id": "2", "_score": 1, "_source": { "quote": "Obi-Wan never told you what happened to your father." } }, ...

Page 68: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

POST /starwars/_search{ "query": { "match": { "quote": "droid" } }}

Page 69: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

{ "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.39556286, "hits": [ { "_index": "starwars", "_type": "quotes", "_id": "1", "_score": 0.39556286, "_source": { "quote": "These are <em>not</em> the droids you are looking for." } } ] }}

Page 70: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

POST /starwars/_search{ "query": { "match": { "quote": "dad" } }}

Page 71: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

... "hits": { "total": 2, "max_score": 0.41913947, "hits": [ { "_index": "starwars", "_type": "quotes", "_id": "3", "_score": 0.41913947, "_source": { "quote": "<b>No</b>. I am your father." } }, { "_index": "starwars", "_type": "quotes", "_id": "2", "_score": 0.39291072, "_source": { "quote": "Obi-Wan never told you what happened to your father." } } ] }}

Page 72: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

POST /starwars/_search{ "query": { "match_phrase": { "quote": "I am your father" } }}

Page 73: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

{ "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 1.5665855, "hits": [ { "_index": "starwars", "_type": "quotes", "_id": "3", "_score": 1.5665855, "_source": { "quote": "<b>No</b>. I am your father." } } ] }}

Page 74: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

POST /starwars/_search{ "query": { "match_phrase": { "quote": "I am not your father" } }}

Page 75: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

{ "took": 15, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 0, "max_score": null, "hits": [] }}

Page 76: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

POST /starwars/_search{ "query": { "match_phrase": { "quote": { "query": "I am not your father", "slop": 1 } } }}

Page 77: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

{ "took": 5, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 1.0409548, "hits": [ { "_index": "starwars", "_type": "quotes", "_id": "3", "_score": 1.0409548, "_source": { "quote": "<b>No</b>. I am your father." } } ] }}

Page 78: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

POST /starwars/_search{ "query": { "match": { "quote": { "query": "van", "fuzziness": "AUTO" } } }}

Page 79: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

{ "took": 14, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.18155496, "hits": [ { "_index": "starwars", "_type": "quotes", "_id": "2", "_score": 0.18155496, "_source": { "quote": "Obi-Wan never told you what happened to your father." } } ] }}

Page 80: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

POST /starwars/_search{ "query": { "match": { "quote": { "query": "ovi-van", "fuzziness": 1 } } }}

Page 81: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

{ "took": 109, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.3798467, "hits": [ { "_index": "starwars", "_type": "quotes", "_id": "2", "_score": 0.3798467, "_source": { "quote": "Obi-Wan never told you what happened to your father." } } ] }}

Page 82: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

FuzzyQuery Historyhttp://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html

Before: Brute forceNow: Levenshtein Automaton

Page 83: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

http://blog.notdot.net/2010/07/Damn-Cool-Algorithms-Levenshtein-Automata

Page 84: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

SELECT * FROM starwars WHERE quote LIKE "?an" OR quote LIKE "V?n" OR quote LIKE "Va?"

Page 85: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Scoring

Page 86: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Term Frequency /Inverse Document Frequency (TF/IDF)

Search one term

Page 87: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

BM25Default in Elasticsearch 5.0

https://speakerdeck.com/elastic/improved-text-scoring-with-bm25

Page 88: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Term Frequency

Page 89: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Page 90: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Inverse Document Frequency

Page 91: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Page 92: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Field-Length Norm

Page 93: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Putting it Togetherscore(q,d) = queryNorm(q) · coord(q,d) · ∑ ( tf(t in d) · idf(t)² · t.getBoost() · norm(t,d) ) (t in q)

Page 94: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

POST /starwars/_search?explain{ "query": { "match": { "quote": "father" } }}

Page 95: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

..."_explanation": { "value": 0.41913947, "description": "weight(Synonym(quote:dad quote:father) in 0) [PerFieldSimilarity], result of:", "details": [ { "value": 0.41913947, "description": "score(doc=0,freq=2.0 = termFreq=2.0\n), product of:", "details": [ { "value": 0.2876821, "description": "idf(docFreq=1, docCount=1)", "details": [] }, { "value": 1.4569536, "description": "tfNorm, computed from:", "details": [ { "value": 2, "description": "termFreq=2.0", "details": [] }, ...

Page 96: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Score0.41913947: i am your father

0.39291072: obi wan never told youwhat happen your father

Page 97: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Vector Space ModelSearch multiple terms

Page 98: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Score each termVectorize

Calculate angle

Page 99: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Search your father

Page 100: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Page 101: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Function ScoreScript, weight, random, field value, decay

(geo or date)

Page 102: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

POST /starwars/_search{ "query": { "function_score": { "query": { "match": { "quote": "father" } }, "random_score": {} } }}

Page 103: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Conclusion

Page 104: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

IndexingFormattingTokenize

Lowercase, Stop Words, StemmingSynonyms

Page 105: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

ScoringTerm Frequency

Inverse Document FrequencyField-Length NormVector Space Model

Page 106: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Grazie!Questions?Philipp Krenn @xeraa

PS: Stickers

Page 107: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

More

Page 108: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

POST /starwars/_search{ "query": { "match": { "quote": "father" } }, "highlight": { "pre_tags": [ "<tag>" ], "post_tags": [ "</tag>" ], "fields": { "quote": {} } }}

Page 109: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

..."hits": [ { "_index": "starwars", "_type": "quotes", "_id": "3", "_score": 0.41913947, "_source": { "quote": "<b>No</b>. I am your father." }, "highlight": { "quote": [ "<b>No</b>. I am your <tag>father</tag>." ] } }, ...

Page 110: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Boolean Queriesmust must_not should filter

Page 111: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

POST /starwars/_search{ "query": { "bool": { "must": { "match": { "quote": { "query": "father" } } }, "should": [ { "match": { "quote": { "query": "your" } } }, { "match": { "quote": { "query": "obi" } } } ] } }}

Page 112: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

... "hits": { "total": 2, "max_score": 0.96268076, "hits": [ { "_index": "starwars", "_type": "quotes", "_id": "2", "_score": 0.96268076, "_source": { "quote": "Obi-Wan never told you what happened to your father." } }, { "_index": "starwars", "_type": "quotes", "_id": "3", "_score": 0.73245656, "_source": { "quote": "<b>No</b>. I am your father." } } ] }}

Page 113: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

POST /starwars/_search{ "query": { "bool": { "filter": { "match": { "quote": { "query": "father" } } }, "should": [ { "match": { "quote": { "query": "your" } } }, { "match": { "quote": { "query": "obi" } } } ] } }}

Page 114: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

... "hits": { "total": 2, "max_score": 0.56977004, "hits": [ { "_index": "starwars", "_type": "quotes", "_id": "2", "_score": 0.56977004, "_source": { "quote": "Obi-Wan never told you what happened to your father." } }, { "_index": "starwars", "_type": "quotes", "_id": "3", "_score": 0.31331712, "_source": { "quote": "<b>No</b>. I am your father." } } ] }}

Page 115: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

POST /starwars/_search{ "query": { "bool": { "must": { "match": { "quote": { "query": "father" } } }, "should": [ { "match": { "quote": { "query": "your" } } }, { "match": { "quote": { "query": "obi" } } }, { "match": { "quote": { "query": "droid" } } } ], "minimum_number_should_match": 2 } }}

Page 116: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

... "hits": { "total": 1, "max_score": 0.96268076, "hits": [ { "_index": "starwars", "_type": "quotes", "_id": "2", "_score": 0.96268076, "_source": { "quote": "Obi-Wan never told you what happened to your father." } } ] }}

Page 117: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

BoostingDefault 1 — greater or smaller

Page 118: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

POST /starwars/_search{ "query": { "bool": { "must": { "match": { "quote": { "query": "father" } } }, "should": [ { "match": { "quote": { "query": "your" } } }, { "match": { "quote": { "query": "obi", "boost": 3 } } } ] } }}

Page 119: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

... "hits": { "total": 2, "max_score": 1.5324509, "hits": [ { "_index": "starwars", "_type": "quotes", "_id": "2", "_score": 1.5324509, "_source": { "quote": "Obi-Wan never told you what happened to your father." } }, { "_index": "starwars", "_type": "quotes", "_id": "3", "_score": 0.73245656, "_source": { "quote": "<b>No</b>. I am your father." } } ] }}

Page 120: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

SuggestionSuggest a similar text

_search end point_suggest deprecated

Page 121: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

POST /starwars/_search{ "query": { "match": { "quote": "fath" } }, "suggest": { "my_suggestion" : { "text" : "drui", "term" : { "field" : "quote" } } }}

Page 122: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

... "hits": { "total": 0, "max_score": null, "hits": [] }, "suggest": { "my_suggestion": [ { "text": "drui", "offset": 0, "length": 4, "options": [ { "text": "droid", "score": 0.5, "freq": 1 } ] } ] }}

Page 123: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

NGramPartial matches

TrigramEdge Gram

Page 124: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

GET /_analyze{ "char_filter": [ "html_strip" ], "tokenizer": { "type": "ngram", "min_gram": "3", "max_gram": "3", "token_chars": [ "letter" ] }, "filter": [ "lowercase" ], "text": "These are <em>not</em> the droids you are looking for."}

Page 125: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

{ "tokens": [ { "token": "the", "start_offset": 0, "end_offset": 3, "type": "word", "position": 0 }, { "token": "hes", "start_offset": 1, "end_offset": 4, "type": "word", "position": 1 }, { "token": "ese", "start_offset": 2, "end_offset": 5, "type": "word", "position": 2 }, { "token": "are", "start_offset": 6, "end_offset": 9, "type": "word", "position": 3 }, ...

Page 126: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

GET /_analyze{ "char_filter": [ "html_strip" ], "tokenizer": { "type": "edge_ngram", "min_gram": "1", "max_gram": "3", "token_chars": [ "letter" ] }, "filter": [ "lowercase" ], "text": "These are <em>not</em> the droids you are looking for."}

Page 127: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

{ "tokens": [ { "token": "t", "start_offset": 0, "end_offset": 1, "type": "word", "position": 0 }, { "token": "th", "start_offset": 0, "end_offset": 2, "type": "word", "position": 1 }, { "token": "the", "start_offset": 0, "end_offset": 3, "type": "word", "position": 2 }, { "token": "a", "start_offset": 6, "end_offset": 7, "type": "word", "position": 3 }, { "token": "ar", "start_offset": 6, "end_offset": 8, "type": "word", "position": 4 }, ...

Page 128: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Combining AnalyzersReindex

Store multiple timesCombine scores

Page 129: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

PUT /starwars_extended{ "settings": { "analysis": { "filter": { "my_synonym_filter": { "type": "synonym", "synonyms": [ "droid,machine", "father,dad" ] }, "my_ngram_filter": { "type": "ngram", "min_gram": "3", "max_gram": "3", "token_chars": [ "letter" ] } },

Page 130: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

"analyzer": { "my_lowercase_analyzer": { "char_filter": [ "html_strip" ], "tokenizer": "whitespace", "filter": [ "lowercase" ] }, "my_full_analyzer": { "char_filter": [ "html_strip" ], "tokenizer": "standard", "filter": [ "lowercase", "stop", "snowball", "my_synonym_filter" ] },

Page 131: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

"my_ngram_analyzer": { "char_filter": [ "html_strip" ], "tokenizer": "whitespace", "filter": [ "lowercase", "stop", "my_ngram_filter" ] } } }},

Page 132: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

"mappings": { "quotes": { "properties": { "quote": { "type": "text", "fields": { "lowercase": { "type": "text", "analyzer": "my_lowercase_analyzer" }, "full": { "type": "text", "analyzer": "my_full_analyzer" }, "ngram": { "type": "text", "analyzer": "my_ngram_analyzer" } } } } } }}

Page 133: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

POST /_reindex{ "source": { "index": "starwars" }, "dest": { "index": "starwars_extended" }}

Page 134: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

POST /starwars_extended/_search?explain{ "query": { "multi_match": { "query": "obiwan", "fields": [ "quote", "quote.lowercase", "quote.full", "quote.ngram" ], "type": "most_fields" } }}

Page 135: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

..."hits": { "total": 1, "max_score": 0.47685796, "hits": [ { "_shard": "[starwars_extended][2]", "_node": "OsGbL-tZQJ-A8HO2PyDfhA", "_index": "starwars_extended", "_type": "quotes", "_id": "2", "_score": 0.47685796, "_source": { "quote": "Obi-Wan never told you what happened to your father." }, ...

Page 136: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

"weight( Synonym(quote.ngram:biw quote.ngram:iwa quote.ngram:obi quote.ngram:wan)in 0) [PerFieldSimilarity], result of:"

Page 137: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

DELETE /starwars

DELETE /starwars_extended

Page 138: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Different Analyzers for Indexing and Searching

Per queryIn the mapping

Page 139: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

GET /starwars_extended/_search{ "query": { "match": { "quote.ngram": { "query": "the", "analyzer": "standard" } } }}

Page 140: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

..."hits": [ { "_index": "starwars_extended", "_type": "quotes", "_id": "2", "_score": 0.38254172, "_source": { "quote": "Obi-Wan never told you what happened to your father." } }, { "_index": "starwars_extended", "_type": "quotes", "_id": "3", "_score": 0.36165747, "_source": { "quote": "<b>No</b>. I am your father." } }]...

Page 141: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

POST /starwars_extended/_closePUT /starwars_extended/_settings{ "analysis": { "filter": { "my_edgegram_filter": { "type": "edge_ngram", "min_gram": 1, "max_gram": 10 } }, "analyzer": { "my_edgegram_analyzer": { "char_filter": [ "html_strip" ], "tokenizer": "whitespace", "filter": [ "lowercase", "my_edgegram_filter" ] } } }}POST /starwars_extended/_open

Page 142: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

PUT /starwars_extended/quotes/_mapping{ "properties": { "quote": { "type": "text", "fields": { "edgegram": { "type": "text", "analyzer": "my_edgegram_analyzer", "search_analyzer": "standard" } } } }}

Page 143: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

PUT /starwars_extended/quotes/4{ "quote": "I find your lack of faith disturbing."}PUT /starwars_extended/quotes/5{ "quote": "That... is why you fail."}

Page 144: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

POST /starwars_extended/_search{ "query": { "match": { "quote.ngram": "faith" } }}

Page 145: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

..."hits": [ { "_index": "starwars_extended", "_type": "quotes", "_id": "4", "_score": 1.3019705, "_source": { "quote": "I find your lack of faith disturbing." } }, { "_index": "starwars_extended", "_type": "quotes", "_id": "5", "_score": 0.3812654, "_source": { "quote": "That... is why you fail." } }]...

Page 146: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

POST /starwars_extended/_search{ "query": { "match": { "quote.edgegram": "faith" } }}

Page 147: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

..."hits": { "total": 1, "max_score": 0.41821626, "hits": [ { "_index": "starwars_extended", "_type": "quotes", "_id": "4", "_score": 0.41821626, "_source": { "quote": "I find your lack of faith disturbing." } } ] ...

Page 148: Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017

Image Credits

Schnitzel https://flic.kr/p/9m27wmArchitecture https://flic.kr/p/6dwCAe

Conchita https://flic.kr/p/nBqSHT