Transcript
Relevancy in Commerce Search
Klaus Herrmann – hybris software
- Free text search is the most popular way to find products in online shops!
- Irrelevant results quickly lead to user frustration - Who really goes to pages 2 and beyond if the first page shows garbage?
- Search in eCommerce is very often powered by document search engines not optimized to find products
- Apache Solr and Elastic Search are the most popular (open source) choices
- Commercial alternatives are abundant, but often expensive, heavyweight, hard to integrate, ... - Endeca, Fact Finder, Fredhopper, Compario, Pertimm, ... - ... And not necessarily better
- Most of us will have to make relevancy work
Using Full Text Search Engines to Find Products
Full Text Search Relevancy: TF-IDF
Full Text Search Relevancy: TF-IDF
Is “sale” in “Sale! T-shirts! Sale!” really twice as relevant than in just “T-shirt Sale!” ?
Full Text Search Relevancy: TF-IDF
Counting in all documents includes all categories and catalogs, regardless of current context and filters! Are rare terms necessarily more meaningful?
- TF-IDF can do more harm than good when searching (semi-)structured data - Potentially bad score discounting for frequent terms - Confusing boosts to rare but meaningless keywords - E.g. Matching the colour (red / green / ...) of a product should have a comparable effect on
scoring regardless of the number of products you sell in that colour
- High-tech algorithmic fixes exist - Research on better versions of TF-IDF - Model “information gain” of keywords in contexts: e.g. as done by Etsy.com
- Lower tech fixes exist, too! - TF=1, IDF=1 for product titles, brands, colours – structured data - Use boosts and functions e.g. to push newer, cheaper, high margin, ... Products
“q=keyword OR (inStock:true^100)” “boost=recip(ms(NOW/HOUR,pubdate),3.16e-11,0.08,0.05)”
- Field weights matter a great deal: SKU > Brand > Colour > Title > Description - Clean, well structured product data tops all.
Beyond TF-IDF
- Other factors are at least as important
- Performance: “2 second rule” - Especially when paging
- Meaningful facet filters to help navigate results
Relevancy is complex. Do we really have to?
- Other factors are at least as important
- Performance: “2 second rule” - Especially when paging
- Meaningful facet filters to help navigate results
- Visual presentation makes browsing more fun - Especially in fashion
Relevancy is complex. Do we really have to?
- Other factors are at least as important
- Performance: “2 second rule” - Especially when paging
- Meaningful facet filters to help navigate results
- Visual presentation makes browsing more fun - Especially in fashion
- Help your users phrase good queries - Autocomplete keywords - Category and brand suggestions - Spell checking - Search-as-you-Type results
Relevancy is complex. Do we really have to?
- Scoring for human beings (Elastic Search): https://speakerdeck.com/elasticsearch/scoring-for-human-beings http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html
- Beyond TF-IDF (Etsy.com): http://www.slideshare.net/lucenerevolution/beyond-tf-idf-why-what-how http://www.youtube.com/watch?v=C25txE_dq90
Further Reading
top related