Top Banner
Intelligent Search
43

Intelligent Search

May 08, 2015

Download

Technology

Ted Dunning

ApacheCon 2009 talk describing methods for doing intelligent (well, really clever at least) search on items with no or poor meta-data.

The video of the talk should be available shortly on the ApacheCon web-site.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Intelligent Search

Intelligent Search

Page 2: Intelligent Search

Intelligent Search(or at least really clever)

Page 3: Intelligent Search

Some Preliminaries

• Text retrieval = matrix multiplication

A: our corpusdocuments are rowsterms are columns

Page 4: Intelligent Search

Some Preliminaries

• Text retrieval = matrix multiplication

for each document d:for each term t:sd += adt qt

A: our corpusdocuments are rowsterms are columns

Page 5: Intelligent Search

Some Preliminaries

• Text retrieval = matrix multiplication

A: our corpusdocuments are rowsterms are columns

sd = Σt adt qt

Page 6: Intelligent Search

Some Preliminaries

• Text retrieval = matrix multiplication

A: our corpusdocuments are rowsterms are columns

s = A q

Page 7: Intelligent Search

More Preliminaries

• Recommendation = Matrix multiply

A: our users’ historiesusers are rowsitems are columns

Page 8: Intelligent Search

More Preliminaries

• Recommendation = Matrix multiply

A: our users’ historiesusers are rowsitems are columns

Users who bought itemsin the list h also bought items in the list r

Page 9: Intelligent Search

More Preliminaries

• Recommendation = Matrix multiply

for each user u:for each item t1:for each item t2:

rt1 += au,t1 au,t2 ht2

A: our users’ historiesusers are rowsitems are columns

Page 10: Intelligent Search

More Preliminaries

• Recommendation = Matrix multiply

A: our users’ historiesusers are rowsitems are columns

sd = Σt2 Σu au,t1 au,t2 qt2

Page 11: Intelligent Search

More Preliminaries

• Recommendation = Matrix multiply

A: our users’ historiesusers are rowsitems are columns

s = A’ (A q)

Page 12: Intelligent Search

More Preliminaries

• Recommendation = Matrix multiply

A: our users’ historiesusers are rowsitems are columns

s = (A’ A) q

Page 13: Intelligent Search

More Preliminaries

• Recommendation = Matrix multiply

A: our users’ historiesusers are rowsitems are columns

s = (A’ A) q ish!

Page 14: Intelligent Search

Why so ish?

• In real life, ish happens because:

• Big data ... so we selectively sample

• Sparse data ... so we smooth

• Finite computers ... so we sparsify

• Top-40 effect ... so we use some stats

Page 15: Intelligent Search

The same in spite of ish

• The shape of the computation is unchanged

• The cost of the computation is unchanged

• Broad algebraic conclusions still hold

Page 16: Intelligent Search

Back to recommendations ...

Page 17: Intelligent Search

Dyadic Structure● Functional

– Interaction: actor -> item*● Relational

– Interaction ⊆ Actors x Items● Matrix

– Rows indexed by actor, columns by item– Value is count of interactions

● Predict missing observations

Page 18: Intelligent Search

Fundamental Algorithmics● Cooccurrence

● A is actors x items, K is items x items● Product has general shape of matrix ● K tells us “users who interacted with x also

interacted with y”

Page 19: Intelligent Search

Fundamental Algorithmic Structure● Cooccurrence

● Matrix approximation by factoring

● LLR

Page 20: Intelligent Search

But Wait ...

Page 21: Intelligent Search

But Wait ...

Does it have to be that way?

Page 22: Intelligent Search

What we have:

For a user who watched/bought/listened to this

Page 23: Intelligent Search

What we have:

For a user who watched/bought/listened to this

Sum over all other users who watched/bought/...

Page 24: Intelligent Search

What we have:

For a user who watched/bought/listened to this

Sum over all other users who watched/bought/...

Add up what they watched/bought/listened to

Page 25: Intelligent Search

What we have:

For a user who watched/bought/listened to this

Sum over all other users who watched/bought/...

Add up what they watched/bought/listened to

And recommend that

Page 26: Intelligent Search

What we have:

For a user who watched/bought/listened to this

Sum over all other users who watched/bought/...

Add up what they watched/bought/listened to

And recommend that

ish

Page 27: Intelligent Search

What we have:

Add up what they watched/bought/listened to

Page 28: Intelligent Search

What we have:

Add up what they watched/bought/listened to

But wait, we can do that faster

Page 29: Intelligent Search

What we have:

Add up what they watched/bought/listened to

But wait, we can do that faster

Page 30: Intelligent Search

But why not ...

Page 31: Intelligent Search

But why not ...

Page 32: Intelligent Search

But why not ...

Why just dyadic learning?

Page 33: Intelligent Search

But why not ...

Why just dyadic learning?

Why not triadic learning?

Page 34: Intelligent Search

But why not ...

Why just dyadic learning?

Why not p-adic learning?

Page 35: Intelligent Search

For example● Users enter queries (A)

– (actor = user, item=query) ● Users view videos (B)

– (actor = user, item=video)● AʼA gives query recommendation

– “did you mean to ask for”● BʼB gives video recommendation

– “you might like these videos”

Page 36: Intelligent Search

The punch-line● BʼA recommends videos in response to a query

– (isnʼt that a search engine?)– (not quite, it doesnʼt look at content or meta-data)

Page 37: Intelligent Search

Real-life example● Query: “Paco de Lucia”● Conventional meta-data search results:

– “hombres del paco” times 400– not much else

● Recommendation based search:– Flamenco guitar and dancers– Spanish and classical guitar– Van Halen doing a classical/flamenco riff

Page 38: Intelligent Search

Real-life example

Page 39: Intelligent Search

Real-life example

Page 40: Intelligent Search

System Diagram

Viewing Logs t user video

Search Logs t user query-term

selective sampler

selective sampler

count

count

join on user

count

Related videos v => v1 v2...

Related terms v => t1 t2...

llr + sparsify

Hadoop

Page 41: Intelligent Search

Indexing

Related videos v => v1 v2...

Video meta v => url title...

join on video

Lucene Index

Related terms v => t1 t2...

Hadoop Lucene (+Katta?)

Page 42: Intelligent Search

Hypothetical Example● Want a navigational ontology?● Just put labels on a web page with traffic

– This gives A = users x label clicks● Remember viewing history

– This gives B = users x items● Cross recommend

– BʼA = click to item mapping● After several users click, results are whatever

users think they should be

Page 43: Intelligent Search

Resources● My blog

– http://tdunning.blogspot.com/● The original LLR in NLP paper

– Accurate Methods for the Statistics of Surprise and Coincidence (check on citeseer)

● Source code– Mahout project– contact me ([email protected])