Personalizing Search Jaime Teevan, MIT Susan T. Dumais, MSR and Eric Horvitz, MSR
Jan 05, 2016
Personalizing SearchJaime Teevan, MIT
Susan T. Dumais, MSR
and Eric Horvitz, MSR
Relevant result
“pia workshop”Query:
Outline
Approaches to personalizationThe PS algorithmEvaluationResultsFuture work
Approaches to Personalization
Content of user profile Long-term interests
Liu, et al. [14], Compass Filter [13] Short-term interests
Query refinement [2,12,15], Watson [4]
How user profile is developed Explicit
Relevance feedback [19], query refinement [2,12,15] Implicit
Query history [20, 22], browsing history [16, 23]
Very rich user profile
PS Search Engine
query
PS Search Engine
query
dog cat monkey banana
food
baby infant
child boy girl
forest hiking
walking gorp
baby infant
child boy girl
csail mit artificial research
robotweb
search retrieval ir
hunt
1.6 0.26.0
0.2 2.7
1.3
PS Search Engine
query
Search results page
web search retrieval ir hunt
1.3
Calculating a Document’s Score
Based on standard tf.idf
Score = Σ tfi * wi
web search retrieval ir hunt
1.3
Calculating a Document’s Score
Based on standard tf.idf
Score = Σ tfi * wi
Σ0.1
0.05 0.5 0.35
0.3
1.3
0.3 0.7 0.1 0.23 0.6 0.6
0.002 0.7 0.1 0.01 0.6
0.2 0.8 0.1 0.001
0.3 0.4
0.1 0.7 0.001
0.23 0.6
0.1 0.7 0.001 0.23 0.6
0.1 0.05
0.5 0.35 0.3
0.1 0.05
0.5 0.35 0.3
N
ni
Calculating a Document’s Score
Based on standard tf.idf
Score = Σ tfi * wi
World (N)
(ni)wi = log
Σ 1.3
0.002 0.7 0.1 0.01 0.6
0.002 0.7 0.1 0.01 0.6
0.3 0.7 0.1 0.23 0.6 0.6
0.002 0.7 0.1 0.01 0.6
0.2 0.8 0.1 0.001
0.3 0.4
0.1 0.7 0.001
0.23 0.6
0.1 0.7 0.001 0.23 0.6
0.1 0.05
0.5 0.35 0.3
N
ni
Calculating a Document’s Score
Based on standard tf.idf
Score = Σ tfi * wi
(N)
(ni)wi = logWorld
ri R
(ri+0.5)(N-ni-R+ri+0.5)
(ni-ri+0.5)(R-ri+0.5)wi = log
†
† From Sparck Jones, Walker and Roberson, 1998 [21].
’
’
Where: N = N+R, ni = ni+ri’’
Client
(ri+0.5)(N-ni-R+ri+0.5)
(ni-ri+0.5)(R-ri+0.5)wi = log
’
Finding the Parameter Values
Corpus representation (N, ni) How common is the term in general? Web vs. result set
User representation (R, ri) How well does it represent the user’s interest? All vs. recent vs. Web vs. queries vs. none
Document representation What terms to sum over? Full document vs. snippet
web search retrieval ir hunt
Building a Test Bed
15 evaluators x ~10 queries 131 queries total
Personally meaningful queries Selected from a list Queries issued earlier (kept diary)
Evaluate 50 results for each query Highly relevant / relevant / irrelevant
Index of personal information
Evaluating Personalized Search
Measure algorithm quality
DCG(i) = {Look at one parameter at a time
67 different parameter combinations! Hold other parameters constant and vary one
Look at best parameter combination Compare with various baselines
Gain(i),DCG(i–1) + Gain(i)/log(i),
if i = 1otherwise
Analysis of Parameters
0.27
0.28
0.29
0.3
0.31
0.32
0.33
0.34
0.35Full
text
Web
Snip
pet
None
Query
Web
Recent
All
Snip
pet
Full
text
User
Analysis of Parameters
0.27
0.28
0.29
0.3
0.31
0.32
0.33
0.34
0.35Full
text
Web
Snip
pet
None
Query
Web
Recent
All
Snip
pet
Full
text
Corpus User Document
PS Improves Text Retrieval
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
No RF PS Web Combo
DC
G
No modelRelevance
FeedbackPersonalized
Search0.37
0.410.46
Text Features Not Enough
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
No RF PS Web Combo
DC
G
0.370.41
0.46
0.56
Take Advantage of Web Ranking
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
No RF PS Web Combo
DC
G
0.370.41
0.46
0.56 0.58
PS+Web
Summary
Personalization of Web search Result re-ranking User’s documents as relevance feedback
Rich representations important Rich user profile particularly important Efficiency hacks possible Need to incorporate features beyond text
Further Exploration
Improved non-text components Usage data Personalized PageRank
Learn parameters Based on individual Based on query Based on results
UIs for user control
User Interface Issues
Make personalization transparentGive user control over personalization
Slider between Web and personalized results Allows for background computation
Exacerbates problem with re-finding Results change as user model changes Thesis research – Re:Search Engine
Much Room for Improvement
Group ranking Best improves on
Web by 23% More people
Less improvement
Personal ranking Best improves on
Web by 38% Remains constant
0.8
0.85
0.9
0.95
1
1.05
1 2 3 4 5 6
Number of People
DC
G
Personalized Group
Potential forPersonalization
Evaluating Personalized Search
Query selection Chose from 10 pre-selected queries Previously issued query
cancerMicrosofttraffic…
bison friseRed Soxairlines…
Las VegasriceMcDonalds…
Pre-selected
53 pre-selected (2-9/query)
Total: 137
JoeMary
Making PS Practical
Learn most about personalization by deploying a system
Best algorithm reasonably efficientMerging server and client
Query expansion Get more relevant results in the set to be re-ranked
Design snippets for personalization