Top Banner
OCTOBER 1114, 2016 • BOSTON, MA
31

Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

Jan 07, 2017

Download

Technology

LucidWorks
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

OCTOBER  11-­‐14,  2016    •    BOSTON,  MA  

Page 2: Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

Anyone  can  build  a  Recsys  w/  Solr!  Doug  Turnbull  

Relevance  Consultant,  OpenSource  ConnecIons  

Page 3: Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

I’m now available in book form!

https://www.manning.com/books/relevant-search Discount code: relsearch (38% off)

http://opensourceconnections.com/about-us/doug-turnbull/

Me The company...

Page 4: Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

field  Body    term  laser          doc  2  

 <metadata>            doc  4  

 <metadata>        term  light          doc  2        <metadata>    term  lightsaber          doc  0  

How do search engines work? The answer can be found in your textbook…

OpenSource Connections

Book Index: •  Topics -> page no •  Very efficient tool – compare to

scanning the whole book! Lucene uses an index: •  Tokens => document ids: laser => [2, 4]

light => [2, 5] lightsaber => [0, 1, 5, 7]

Page 5: Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

What's the point?

OpenSource Connections

Solr:

-  A general purpose system for looking up content based on features that describe them

Tokens aren't really words! doc0: "I like the bananas"

Analysis

Analysis

term I: doc 0 term lik doc 0 term banan: doc 0

[lik] [banan] Search: "liked banana?"

[I] [lik] [banan]

Page 6: Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

TF*IDF -- measuring feature weight

OpenSource Connections

term I: doc 0: freq: 5 doc 1: freq: 7 doc 3: freq: 4 term banan: doc 0: freq: 2

"Banana-ness" is pretty special

"I-ness" is not special

doc0: tf==5 df==3 (raw) TF*IDF = 5/3 = 1.6667

doc0: tf==2 df==1 (raw) TF*IDF = 2/1 = 2.0

Search is really distributed feature matching and similarity (text-oriented)

Page 7: Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

Search often stands in for human interactions

I have a craving for a nice juicy cut of meat. What might you recommend?

I have JUST the thing!

Page 8: Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

Searching the market q=(juiciness:juicy meatiness:meaty)

Page 9: Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

Modeling arbitrary feature strength

OpenSource Connections

term juicy: steak: juiciness: 5 grapefruit: juiciness: 7 orange: juiciness: 4 term meaty: burger: meatiness: 2

What you want:

{ item: "steak", juiciness: ["juicy", "juicy", "juicy"], meatiness: ["meaty"] }

Use term frequency as feature strength:

{ item: "grapefruit", juiciness: ["juicy", "juicy", "juicy", "juicy", "juicy"], meatiness: [""] }

(remember, Solr doesn't need to store this)

Page 10: Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

TF*IDF -- measuring feature weight

OpenSource Connections

term juicy: doc 0: freq: 5 doc 1: freq: 7 doc 3: freq: 4 term meaty: doc 0: freq: 2

"meaty-ness" is pretty special

"juicy-ness" is pretty non-special

doc0: tf==5 df==3 (raw) TF*IDF = 5/3 = 1.6667

doc0: tf==2 df==1 (raw) TF*IDF = 2/1 = 2.0

Search is really distributed feature matching and similarity

Page 11: Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

Requesting something from my grocer

More juicy Less juicy

More meaty Less meaty

q=meaty juicy

Results: 1.

2.

3.

Page 12: Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

Recsys also stands in for human interactions Hi Jane, Recommend me something?

Hmm… <Tom likes limes, what is similar to limes?>

Page 13: Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

"Content Based" recommendations

Use existing properties of thing to recommend similar things

juicy

citrus

More like this for unstructured data

What features/tokens are most representative of this thing?

http://solr.quepid.com/solr/tmdb/select?q={!mlt%20qf=overview}97&fl=title,id,overview (movies like Tron)

juicy

citrus (search)

Here's some ideas...

{ item: "lime", juiciness: ["juicy", "juicy", "juicy"], citrusness: ["citrus", "citrus", "citrus"], meatiness: [""], partyness: ["party"] }

Page 14: Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

"Content Based" more-like-these

Use existing properties of thing to recommend similar things

juicy

meaty citrus

http://solr.quepid.com/solr/tmdb/select?q={!mlt%20qf=overview}97&fl=title,id,overview (movies like Tron)

Here's some ideas...

Jane knows a few more things that Tom likes...

Page 15: Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

Personalization metadata

Index extra data alongside your products { item: "hamburger", preferred_by_genders: ["m", …], preferred_by_ages: ["30_40"] }

age:30_40

gender:m

http://solr.quepid.com/solr/tmdb/select?q={!mlt%20qf=overview}97&fl=title,id,overview (movies like Tron)

Here's some ideas...

Jane knows a few things about Tom (30 yr old male)

Page 16: Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

But, Jane's intuition transcends words!

age:30_40

gender:m

Currently we're stuck with predefined labels:

citrus juicy

meaty

We're curating using known vocabularies

(can we describe everything?)

Page 17: Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

What we like often transcends words There are emergent properties of our world that don't have names

Relative flarglewharbliness

More flarglewharbily Less flarglewharbily

Diet Coke

Page 18: Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

What's a flarglewharble?

More flarglewharbily Less flarglewharbily

fruit orange lemon banana mentos diet coke

tom X

sue X X X

charlie X X

clare X X

hal x x

Goes together

Diet Coke

Page 19: Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

Can search find the flargles?

q=(flargliwharbliness:very)

 term  flarglewharble:          diet-­‐coke:              flargleness:  4          mentos:              flargleness:  3          banana              flargleness:  1      

Can we somehow build?

Diet Coke

Page 20: Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

person\food orange lemon banana mentos diet coke

tom X X

sue X X X X

charlie X X

clare X X

hal x x X

Goes together

flarglewharble!

Babies often use made-up words based on emergent patterns in their universe They are less committed to our language

Page 21: Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

What's the point? Collaborative filtering

Latent vocabulary (the flarglewharbles)

Pure Search Content-based Recs

Predefined vocabulary

Can Solr discover the latent/emergent vocabularies?

Page 22: Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

Can Solr discover the latent/emergent vocabularies? Well first let's tell Solr about our users

{ user: "Sue" foods_bought: ["lemon", "banana", "mentos", "diet coke"] } { user: "Charlie" foods_bought: ["banana", "mentos", "diet coke"] }

Page 23: Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

Faceting? We need a way to look across users and look for patterns (analyze all the baskets that contain mentos) q=foods_bought:mentos&facet=true&facet.field=foods_bought

facets: mentos: 3 diet-coke: 3 banana: 2

Hmm: -  Bananas are globally popular -  Diet-coke is probably what matters

Page 24: Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

Counts don't work: importance of significance q=foods_bought:mentos&facet=true&facet.field=foods_bought

facets: mentos: 3 diet-coke: 3 banana: 2

Diet Coke: Global popularity: diet coke (3) Local popularity: 3 Score: 3/3 = 1

Banana: Global popularity: banana

(4) Local popularity: 2

Score: 2/4 = 0.5

by-significance: diet-coke: 1 banana: 0.5

Page 25: Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

Streaming Expressions

/select?q=*:*&facet=true&facet.field=liked_movies

But there's a new sheriff in town!

One option: we could go about and gather global doc freqs & compare those locally.

Terms component another option… plugins...

Streaming expressions -- distributed stream computation system on top of Solr Cloud You must ALWAYS cross the streams!

Page 26: Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

Streaming Expressions /stream?expr=scoreNodes(facet(...)...)

facet(movielens, q="*:*", buckets="liked_movies", bucketSorts="count(*) desc", bucketSizeLimit="100", count(*))

Faceting with Streaming Expressions:

Output:

{ "result-set": {"docs":[ { "count(*)":55807, "liked_movies":"318"}, { "count(*)":52352, "liked_movies":"296"}, { "count(*)":50114, "liked_movies":"593"}

Nodes to be transformed

Page 27: Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

Significance with streaming expr /stream?expr=scoreNodes(facet(...)...)

scoreNodes( select( facet(movielens, q="liked_movies:2571 OR liked_movies:4993", buckets="liked_movies", bucketSorts="count(*) desc", bucketSizeLimit="100", count(*)), liked_movies as node, "count(*)", replace(collection, null, withValue=movielens), replace(field, null, withValue=liked_movies)) )

1.  facet (just like above, just with streaming expr) 2.  select to format data for scoreNodes 3.  scoreNodes to score using TF*IDF

Banana occurs in 2 documents here, 4 globally -- 2/4 = 0.5 Diet coke occurs 2 documents here, 2 globally -- 2/2 = 1.0

Thinking back on my shoppers behaviors, here's some other items you might like:

(thanks Joel Bernstein!)

Diet Coke

Page 28: Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

Lots of power here /stream?expr=scoreNodes(facet(...)...)

scoreNodes( select( facet(movielens, q="juiciness_pref:juicy", buckets="liked_movies", bucketSorts="count(*) desc", bucketSizeLimit="100", count(*)), liked_movies as node, "count(*)", replace(collection, null, withValue=movielens), replace(field, null, withValue=liked_movies)) )

Find users that like juicy things, what do they like? Perhaps bucket over the aisle they like? Construct our query to focus on a date range? Many insights (thanks Joel Bernstein!)

Page 29: Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

Only glimpsing the underlying pattern...

We're not enumerating the flarglewharbles, and the schlumblefumbles

More flarglewharbily Less flarglewharbily

Diet Coke

More schlumblewumbly Less schumblewumbly

Diet Coke

Page 30: Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

Coming soon (Solr 6.3) http://yonik.com/solr-6-3/ https://issues.apache.org/jira/browse/SOLR-9258

-  Models for training classifiers -  Then in turn updating documents Progress is being made! -  Clustering?

Page 31: Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbull, OpenSource Connections

Questions? The Flarglewharbles