Personalizing Search

Personalizing SearchJaime Teevan, MIT

Susan T. Dumais, MSR

and Eric Horvitz, MSR

Relevant result

“pia workshop”Query:

Outline

Approaches to personalizationThe PS algorithmEvaluationResultsFuture work

Approaches to Personalization

Content of user profile Long-term interests

Liu, et al. [14], Compass Filter [13] Short-term interests

Query refinement [2,12,15], Watson [4]

How user profile is developed Explicit

Relevance feedback [19], query refinement [2,12,15] Implicit

Query history [20, 22], browsing history [16, 23]

Very rich user profile

PS Search Engine

query

PS Search Engine

query

dog cat monkey banana

food

baby infant

child boy girl

forest hiking

walking gorp

baby infant

child boy girl

csail mit artificial research

robotweb

search retrieval ir

hunt

1.6 0.26.0

0.2 2.7

1.3

PS Search Engine

query

Search results page

web search retrieval ir hunt

1.3

Calculating a Document’s Score

Based on standard tf.idf

Score = Σ tfi * wi


1.3



Score = Σ tfi * wi

Σ0.1

0.05 0.5 0.35

0.3

1.3

0.3 0.7 0.1 0.23 0.6 0.6

0.002 0.7 0.1 0.01 0.6

0.2 0.8 0.1 0.001

0.3 0.4

0.1 0.7 0.001

0.23 0.6

0.1 0.7 0.001 0.23 0.6

0.1 0.05

0.5 0.35 0.3

0.1 0.05

0.5 0.35 0.3

N

ni



Score = Σ tfi * wi

World (N)

(ni)wi = log

Σ 1.3

0.002 0.7 0.1 0.01 0.6

0.002 0.7 0.1 0.01 0.6

0.3 0.7 0.1 0.23 0.6 0.6

0.002 0.7 0.1 0.01 0.6

0.2 0.8 0.1 0.001

0.3 0.4

0.1 0.7 0.001

0.23 0.6

0.1 0.7 0.001 0.23 0.6

0.1 0.05

0.5 0.35 0.3

N

ni



Score = Σ tfi * wi

(N)

(ni)wi = logWorld

ri R

(ri+0.5)(N-ni-R+ri+0.5)

(ni-ri+0.5)(R-ri+0.5)wi = log

†

† From Sparck Jones, Walker and Roberson, 1998 [21].

’

’

Where: N = N+R, ni = ni+ri’’

Client

(ri+0.5)(N-ni-R+ri+0.5)

(ni-ri+0.5)(R-ri+0.5)wi = log

’

Finding the Parameter Values

Corpus representation (N, ni) How common is the term in general? Web vs. result set

User representation (R, ri) How well does it represent the user’s interest? All vs. recent vs. Web vs. queries vs. none

Document representation What terms to sum over? Full document vs. snippet


Building a Test Bed

15 evaluators x ~10 queries 131 queries total

Personally meaningful queries Selected from a list Queries issued earlier (kept diary)

Evaluate 50 results for each query Highly relevant / relevant / irrelevant

Index of personal information

Evaluating Personalized Search

Measure algorithm quality

DCG(i) = {Look at one parameter at a time

67 different parameter combinations! Hold other parameters constant and vary one

Look at best parameter combination Compare with various baselines

Gain(i),DCG(i–1) + Gain(i)/log(i),

if i = 1otherwise

Analysis of Parameters

0.27

0.28

0.29

0.3

0.31

0.32

0.33

0.34

0.35Full

text

Web

Snip

pet

None

Query

Web

Recent

All

Snip

pet

Full

text

User

Analysis of Parameters

0.27

0.28

0.29

0.3

0.31

0.32

0.33

0.34

0.35Full

text

Web

Snip

pet

None

Query

Web

Recent

All

Snip

pet

Full

text

Corpus User Document

PS Improves Text Retrieval

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

No RF PS Web Combo

DC

G

No modelRelevance

FeedbackPersonalized

Search0.37

0.410.46

Text Features Not Enough

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

No RF PS Web Combo

DC

G

0.370.41

0.46

0.56

Take Advantage of Web Ranking

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

No RF PS Web Combo

DC

G

0.370.41

0.46

0.56 0.58

PS+Web

Summary

Personalization of Web search Result re-ranking User’s documents as relevance feedback

Rich representations important Rich user profile particularly important Efficiency hacks possible Need to incorporate features beyond text

Further Exploration

Improved non-text components Usage data Personalized PageRank

Learn parameters Based on individual Based on query Based on results

UIs for user control

User Interface Issues

Make personalization transparentGive user control over personalization

Slider between Web and personalized results Allows for background computation

Exacerbates problem with re-finding Results change as user model changes Thesis research – Re:Search Engine

Thank [email protected]

[email protected]

[email protected]

Much Room for Improvement

Group ranking Best improves on

Web by 23% More people

Less improvement

Personal ranking Best improves on

Web by 38% Remains constant

0.8

0.85

0.9

0.95

1

1.05

1 2 3 4 5 6

Number of People

DC

G

Personalized Group

Potential forPersonalization

Evaluating Personalized Search

Query selection Chose from 10 pre-selected queries Previously issued query

cancerMicrosofttraffic…

bison friseRed Soxairlines…

Las VegasriceMcDonalds…

Pre-selected

53 pre-selected (2-9/query)

Total: 137

JoeMary

Making PS Practical

Learn most about personalization by deploying a system

Best algorithm reasonably efficientMerging server and client

Query expansion Get more relevant results in the set to be re-ranked

Design snippets for personalization

Personalizing Search

Documents

documents scorebased

standard tf

text features

advantage of web ranking0

personalized resultsallows

implicitquery history

test bed15 evaluators

msrpia workshopquery