WSDM Tutorial February 2 nd 2015 Grace Hui Yang Marc Sloan Jun Wang Guest Speaker: Charlie Clarke Dynamic Information Retrieval Modeling
Jul 14, 2015
WSDM Tutorial February 2nd 2015
Grace Hui Yang
Marc Sloan
Jun Wang
Guest Speaker: Charlie Clarke
Dynamic Information Retrieval
Modeling
Dynamic Information
Retrieval
Dynamic Information Retrieval Modeling Tutorial
20152
Document
s to
exploreInformatio
n
need
Observed
document
s
User
Devise a strategy
for helping the
user explore the
information space
in order to learn
which documents
are relevant and
which aren’t, and
satisfy their
information need.
Evolving IR
Dynamic Information Retrieval Modeling Tutorial
20153
Paradigm shifts in IR as new models
emerge
e.g. VSM → BM25 → Language Model
Different ways of defining relationship
between query and document
Static → Interactive → Dynamic
Evolution in modeling user interaction with
search engine
Outline
Dynamic Information Retrieval Modeling Tutorial
20154
Introduction & Theory
Static IR
Interactive IR
Dynamic IR
Session Search
Dynamic Ranking
Recommendation and Advertising
Guest Talk: Charlie Clarke
Discussion Panel
Conceptual Model – Static IR
Dynamic Information Retrieval Modeling Tutorial
20155
Static IRInteractive
IRDynamic
IR
No feedback
Characteristics of Static IR
Dynamic Information Retrieval Modeling Tutorial
20156
Does not learn directly from
user
Parameters updated
periodically
Dynamic Information Retrieval Modeling Tutorial
20157
Commonly Used Static IR
Models
BM25
PageRank
Language
Model
Learning to
Rank
Feedback in IR
Dynamic Information Retrieval Modeling Tutorial
20158
Outline
Dynamic Information Retrieval Modeling Tutorial
20159
Introduction & Theory
Static IR
Interactive IR
Dynamic IR
Session Search
Dynamic Ranking
Recommendation and Advertising
Guest Talk: Charlie Clarke
Discussion Panel
Conceptual Model – Interactive
IR
Dynamic Information Retrieval Modeling Tutorial
201510
Static IRInteractive
IRDynamic
IR
Exploit Feedback
Learn the user’s taste
interactively!
At the same time, provide good
recommendations!
Dynamic Information Retrieval Modeling Tutorial
201511
Interactive Recommender
Systems
Toy Example
Dynamic Information Retrieval Modeling Tutorial
201512
Multi-Page search scenario
User image searches for “jaguar”
Rank two of the four results over two
pages:
𝑟 = 0.5 𝑟 = 0.51 𝑟 = 0.9𝑟 = 0.49
Toy Example – Static
Ranking
Dynamic Information Retrieval Modeling Tutorial
201513
Ranked according to PRP
Page 1 Page 2
1.
2.
𝑟 = 0.9
𝑟 = 0.51
1.
2.
𝑟 = 0.5
𝑟 = 0.49
Toy Example – Relevance
Feedback
Dynamic Information Retrieval Modeling Tutorial
201514
Interactive Search
Improve 2nd page based on feedback
from 1st page
Use clicks as relevance feedback
Rocchio1 algorithm on terms in image
webpage
𝑤𝑞′ = 𝛼𝑤𝑞 +
𝛽
|𝐷𝑟| 𝑑∈𝐷𝑟
𝑤𝑑 −𝛾
𝐷𝑛 𝑑∈𝐷𝑛
𝑤𝑑
New query closer to relevant documents
and different to non-relevant documents1Rocchio, J. J., ’71, Baeza-
Yates & Ribeiro-Neto ‘99
Toy Example – Relevance
Feedback
Dynamic Information Retrieval Modeling Tutorial
201515
Ranked according to PRP and Rocchio
Page 1 Page 2
2.
𝑟 = 0.9
𝑟 = 0.51
1.
2.
𝑟 = 0.5
𝑟 = 0.49
*
1.
* Click
Toy Example – Relevance
Feedback
Dynamic Information Retrieval Modeling Tutorial
201516
No click when searching for animals
Page 1 Page 2
2.
𝑟 = 0.9
𝑟 = 0.51
1.
2.
1. ?
?
Toy Example – Value
Function
Dynamic Information Retrieval Modeling Tutorial
201517
Optimize both pages using dynamic IR
Bellman equation for value function
Simplified example:
𝑉𝑡 𝜃𝑡, Σ𝑡 = max𝑠𝑡
𝜃𝑠𝑡 + 𝐸(𝑉𝑡+1 𝜃𝑡+1, Σ𝑡+1 𝐶𝑡)
𝜃𝑡, Σ𝑡 = relevance and covariance of documents for
page 𝑡
𝐶𝑡 = clicks on page 𝑡
𝑉𝑡 = ‘value’ of ranking on page 𝑡
Maximize value over all pages based on
estimating feedback
X Jin, M. Sloan and J. Wang
’13
1 0.8 0.1 00.8 1 0.1 00.1 0.1 1 0.950 0 0.95 1
Toy Example - Covariance
Dynamic Information Retrieval Modeling Tutorial
201518
Covariance matrix represents similarity between
images
X Jin, M. Sloan and J. Wang
’13
Toy Example – Myopic Value
Dynamic Information Retrieval Modeling Tutorial
201519
For myopic ranking, 𝑉2 = 16.380
Page 1
2.
1.
X Jin, M. Sloan and J. Wang
’13
Toy Example – Myopic
Ranking
Dynamic Information Retrieval Modeling Tutorial
201520
Page 2 ranking stays the same regardless of
clicksPage 1 Page 2
2.
1.
2.
1.
X Jin, M. Sloan and J. Wang
’13
Toy Example – Optimal Value
Dynamic Information Retrieval Modeling Tutorial
201521
For optimal ranking, 𝑉2 = 16.528
Page 1
2.
1.
X Jin, M. Sloan and J. Wang
’13
Toy Example – Optimal Ranking
Dynamic Information Retrieval Modeling Tutorial
201522
If car clicked, Jaguar logo is more relevant on
next pagePage 1 Page 2
2.
1.
2.
1.
X Jin, M. Sloan and J. Wang
’13
Toy Example – Optimal Ranking
Dynamic Information Retrieval Modeling Tutorial
201523
In all other scenarios, rank animal first on next
pagePage 1 Page 2
2.
1.
2.
1.
X Jin, M. Sloan and J. Wang
’13
xx
xx
x xx
x
xx
x
oo
o o
o
o
o
x xdoc about apple ceo
X: doc about apple fruit
O: doc about apple iphone
Documents exist in vector space
24 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under
submission, 2015
Static IR Visualization
Static IR Visualization
xx
xx
x xx
x
xx
x
oo
o o
o
o
o
x xdoc about apple ceo
X: doc about apple fruit
O: doc about apple iphone
Q
25 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under
submission, 2015
t = 1: Static IR considers Relevancy
Static IR Visualization
xx
xx
x xx
x
xx
x
oo
o o
o
o
o
x xdoc about apple ceo
X: doc about apple fruit
O: doc about apple iphone
Q
26 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under
submission, 2015
t = 1: Static IR considers Relevancy
Interactive IR Update
xx
xx
x xx
x
xx
x
oo
o o
o
o
o
x xdoc about apple ceo
X: doc about apple fruit
O: doc about apple iphone
Q
-1
-1
+1
Q’
27 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under
submission, 2015
t = 1: Static IR considers Relevancy
t = 2: Interactive considers local gains
Interactive IR Update
xx
xx
x xx
x
xx
x
oo
o o
o
o
o
x xdoc about apple ceo
X: doc about apple fruit
O: doc about apple iphone
Q
-1
-1
+1
Q’
28 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under
submission, 2015
t = 1: Static IR considers Relevancy
t = 2: Interactive considers local gains
Dynamic Ranking Principle
xx
xx
x xx
x
xx
x
oo
o o
o
o
o
x xdoc about apple ceo
X: doc about apple fruit
O: doc about apple iphone
t = 1: Relevancy + Variance
Q
29 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under
submission, 2015
Dynamic Ranking Principle
xx
xx
x xx
x
xx
x
oo
o o
o
o
o
x xdoc about apple ceo
X: doc about apple fruit
O: doc about apple iphone
t = 1: Relevancy + Variance + |Correlations|
Q
-1
-1
+1
30 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under
submission, 2015
Dynamic Ranking Principle
xx
xx
x xx
x
xx
x
oo
o o
o
o
o
x xdoc about apple ceo
X: doc about apple fruit
O: doc about apple iphone
t = 1: Relevancy + Variance + |Correlations|
Diversified, exploratory relevance ranking
Q
31 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under
submission, 2015
Dynamic Ranking Principle
xx
xx
x xx
x
xx
x
oo
o o
o
o
o
x xdoc about apple ceo
X: doc about apple fruit
O: doc about apple iphone
Q
-1
-1
+1
Q’
32 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under
submission, 2015
t = 1: Relevancy + Variance + |Correlations|
Diversified, exploratory relevance ranking
t = 2: Personalized Re-ranking
Interactive vs Dynamic IR
Dynamic Information Retrieval Modeling Tutorial
201533
• Treats
interactions
independently
• Responds to
immediate
feedback
• Static IR used
before feedback
received
• Optimizes
over all
interaction
• Long term
gains
• Models future
user feedback
• Also used at
beginning of
interaction
Interactive Dynamic
Interactive & Dynamic
Techniques
Dynamic Information Retrieval Modeling Tutorial
201534
• Rocchio
equation in
Relevance
Feedback
• Collaborative
filtering in
recommender
systems
• Active learning
in interactive
retrieval
• POMDP in
multi page
search and ad
recommendati
on
• Multi Armed
Bandits in
Online
Evaluation
• MDP in
session search
Interactive Dynamic
Outline
Dynamic Information Retrieval Modeling Tutorial
201535
Introduction & Theory
Static IR
Interactive IR
Dynamic IR
Session Search
Dynamic Ranking
Recommendation and Advertising
Guest Talk: Charlie Clarke
Discussion Panel
Conceptual Model – Interactive
IR
Dynamic Information Retrieval Modeling Tutorial
201536
Static IRInteractive
IRDynamic
IR
Explore and exploit Feedback
Characteristics of Dynamic
IR
Dynamic Information Retrieval Modeling Tutorial
201537
Rich interactionsQuery formulation
Document clicks
Document examination
Eye movement
Mouse movements
etc.
[Luo et al., IRJ under revision 2014]
Characteristics of Dynamic
IR
Dynamic Information Retrieval Modeling Tutorial
201538
Temporal dependency
clicked documentsquery
D1
ranked documents
q1 C1
D2
q2 C2……
…… Dn
qn Cn
I
information need
iteration 1 iteration 2 iteration n
[Luo et al., IRJ under revision 2014]
Characteristics of Dynamic
IR
Dynamic Information Retrieval Modeling Tutorial
201539
Overall goal
Optimize over all iterations for goal
IR metric or user satisfaction
Optimal policy
[Luo et al., IRJ under revision 2014]
40/33
Dynamic Information
Retrieval
Dynamic Relevance
Dynamic Users
Dynamic Queries
Dynamic Documents
Dynamic Information Needs
Users change behavior
over time, user history
Topic Trends, Filtering,
document content change
User perceived
relevance changes
Changing query
definition i.e. ‘Twitter’
Information needs evolve over time
Next
generation
Search
Engine
Why Not Existing Supervised
Learning for Dynamic IR Modeling?
Dynamic Information Retrieval Modeling Tutorial
201541
Lack of enough training data
Dynamic IR problems contain a sequence of dynamic
interactions
E.g. a series of queries in session
Rare to find repeated sequences (close to zero)
Even in large query logs (WSCD 2013 & 2014, query logs
from Yandex)
Chance of finding repeated adjacent query
pairs is also lowDataset Repeated
Adjacent Query
Pairs
Total Adjacent
Query Pairs
Repeated
Percentage
WSCD
2013
476,390 17,784,583 2.68%
WSCD
2014
1,959,440 35,376,008 5.54%
Our Solution
Dynamic Information Retrieval Modeling Tutorial
201542
Try to find an optimal solution
through a sequence of dynamic
interactions
Trial and Error: learn from repeated, varied attempts
which are continued until success
No (or less) Supervised Learning
Trial and Error
Dynamic Information Retrieval Modeling Tutorial
201543
q1 – "dulles hotels"
q2 – "dulles airport"
q3 – "dulles airport
location"
q4 – "dulles metrostop"
What is a Desirable Model for
Dynamic IR
Dynamic Information Retrieval Modeling Tutorial
201544
Model interactions, which means it needs to have place holders for actions;
Model information need hidden behind user queries and other interactions;
Set up a reward mechanism to guide the entire search algorithm to adjust its retrieval strategies;
Represent Markov properties to handle the temporal dependency.
A model in Trial and Error setting will do!
A Markov Model will do!
Markov Decision Process
Dynamic Information Retrieval Modeling Tutorial
201545
MDP extends MC with actions and rewards1
si– state ai – action ri – reward
pi – transition probability
p0 p1 p2 ……s0 s1
r0
a0
s2
r1
a1
s3
r2
a2
1R. Bellman, ‘57
(S, M, A, R, γ)
Definition of MDP
Dynamic Information Retrieval Modeling Tutorial
201546
A tuple (S, M, A, R, γ)
S : state space
M: transition matrix
Ma(s, s') = P(s'|s, a)
A: action space
R: reward function
R(s,a) = immediate reward taking action a at state s
γ: discount factor, 0< γ ≤1
policy π
π(s) = the action taken at state s
Goal is to find an optimal policy π* maximizing the expected total rewards.
Optimality — Bellman
Equation
Dynamic Information Retrieval Modeling Tutorial
201547
The Bellman equation1 to MDP is a recursive
definition of the optimal value function V*(.)
𝑉∗ s = max𝑎
𝑅 𝑠, 𝑎 + 𝛾
𝑠′
𝑀𝑎(𝑠, 𝑠′)𝑉∗(𝑠′)
Optimal Policy
π∗ s = arg𝑚𝑎𝑥𝑎
𝑅 𝑠, 𝑎 + 𝛾
𝑠′
𝑀𝑎 𝑠, 𝑠′ 𝑉∗(𝑠′)
1R. Bellman, ‘57
state-value function
MDP algorithms
Dynamic Information Retrieval Modeling Tutorial
201548
Value Iteration
Policy Iteration
Modified Policy Iteration
Prioritized Sweeping
Temporal Difference (TD) Learning
Q-Learning
Model free
approaches
Model-based
approaches
[Bellman, ’57, Howard, ‘60, Puterman and Shin, ‘78, Singh & Sutton, ‘96, Sutton &
Barto, ‘98, Richard Sutton, ‘88, Watkins, ‘92]
Solve
Bellman
equation
Optimal
value
V*(s)
Optimal
policy *(s)
[Slide altered from Carlos Guestrin’s ML
lecture]
Apply an MDP to an IR
Problem
Dynamic Information Retrieval Modeling Tutorial
201549
We can model IR systems using a Markov
Decision Process
Is there a temporal component?
States – What changes with each time step?
Actions – How does your system change the
state?
Rewards – How do you measure feedback or
effectiveness in your problem at each time
step?
Transition Probability – Can you determine
this?
If not, then model free approach is more
Outline
Dynamic Information Retrieval Modeling Tutorial
201550
Introduction & Theory
Session Search
Dynamic Ranking
Recommendation and Advertising
Guest Talk: Charlie Clarke
Discussion Panel
TREC Session Tracks (2010-
now)
Given a series of queries {q1,q2,…,qn}, top 10
retrieval results {D1, … Di-1 } for q1 to qi-1, and
click information
The task is to retrieve a list of documents for the
current/last query, qn
Relevance judgment is made based on how
relevant the documents are for qn, and how relevant
they are for information needs for the entire session
(in topic description)
no need to segment the sessions
51Dynamic Information Retrieval Modeling Tutorial
2015
1.pocono mountains pennsylvania
2.pocono mountains pennsylvania hotels
3.pocono mountains pennsylvania things to do
4.pocono mountains pennsylvania hotels
5.pocono mountains camelbeach
6.pocono mountains camelbeach hotel
7.pocono mountains chateau resort
8.pocono mountains chateau resort attractions
9.pocono mountains chateau resort getting to
10.chateau resort getting to
11.pocono mountains chateau resort directions
TREC 2012 Session 6
52
Information needs:
You are planning a winter vacation
to the Pocono Mountains region in
Pennsylvania in the US. Where will
you stay? What will you do while
there? How will you get there?
In a session, queries change
constantly
Dynamic Information Retrieval Modeling Tutorial
2015
Markov Decision Process
We propose to model session search as a
Markov decision process (MDP)
Two agents: the User and the Search Engine
53
[Guan, Zhang and Yang SIGIR 2013]
Settings of the Session MDP
States: Queries
Environments: Search results
Actions:
User actions:
Add/remove/ unchange the query terms
Nicely correspond to our definition of query change
Search Engine actions:
Increase/ decrease /remain term weights
54
[Guan, Zhang and Yang SIGIR 2013]
Search Engine Agent’s
Actions
∈ Di−1 action Example
qtheme
Y increase “pocono mountain” in s6
N increase“france world cup 98 reaction” in s28,
france world cup 98 reaction stock
market→ france world cup 98 reaction
+∆q
Y decrease‘policy’ in s37, Merck lobbyists → Merck
lobbyists US policy
N increase‘US’ in s37, Merck lobbyists → Merck
lobbyists US policy
−∆q
Y decrease‘reaction’ in s28, france world cup 98
reaction
→ france world cup 98
N No
change
‘legislation’ in s32, bollywood legislation
→bollywood law
55 [Guan, Zhang and Yang SIGIR 2013]
Bellman Equation
In a MDP, it is believed that a future reward is
not worth quite as much as a current reward
and thus a discount factor γ ϵ (0,1) is applied
to future rewards.
Bellman Equation gives the optimal value
(expected long term reward starting from state
s and continuing with policy π from then on)
for an MDP:
56
V*(s) = maxa
R(s,a) + g P(s' | s,a)s '
å V*(s')
Our Tweak
In a MDP, it is believed that a future reward is
not worth quite as much as a current reward
and thus a discount factor γ ϵ (0,1) is applied
to future rewards.
In session search, a past reward is not worth
quite as much as a current reward and thus a
discount factor γ should be applied to past
rewards
We model the MDP for session search in a reverse
order
57
Query Change retrieval Model
(QCM)
Bellman Equation gives the optimal value for
an MDP:
The reward function is used as the document
relevance score function and is tweaked
backwards from Bellman equation:
58
V*(s) = maxa
R(s,a) + g P(s' | s,a)s '
å V*(s')
a
Di
)D|(q P maxa) ,D ,q|(q P + d)|(q P = d) ,Score(q 1-i1-i1-i1-iiii1
Document
relevant
score Query
Transition
model
Maximum
past
relevanceCurrent
reward/relevan
ce score
[Guan, Zhang and Yang SIGIR 2013]
Calculating the Transition Model
)|(log)|(
)|(log)()|(log)|(
)|(log)]|(1[+ d)|P(q log = d) ,Score(q
*1
*1
*1ii
*1
*1
dtPdtP
dtPtidfdtPdtP
dtPdtP
qti
dtqt
dtqt
i
qthemeti
ii
59
• According to Query Change and Search
Engine ActionsCurrent reward/
relevance
score
Increase
weights for
theme terms
Decrease
weights for
removed terms
Increase
weights for
novel added
termsDecrease
weights for old
added terms
[Guan, Zhang and Yang SIGIR 2013]
Maximizing the Reward Function
Generate a maximum rewarded document denoted as d*
i-1, from Di-1
That is the document(s) most relevant to qi-1
The relevance score can be calculated as
𝑃 𝑞𝑖−1 𝑑𝑖−1 = 1 − 𝑡∈𝑞𝑖−1
{1 − 𝑃(𝑡|𝑑𝑖−1)}
𝑃 𝑡 𝑑𝑖−1 =#(𝑡,𝑑𝑖−1)
|𝑑𝑖−1|
From several options, we choose to only use the document with top relevance
maxDi-1
P(qi-1 |Di-1)
60Dynamic Information Retrieval Modeling Tutorial
2015 [Guan, Zhang and Yang SIGIR 2013]
Scoring the Entire Session
The overall relevance score for a session of
queries is aggregated recursively :
Scoresession(qn, d) = Score(qn, d) + gScoresession(qn-1, d)
= Score(qn, d) + g[Score(qn-1, d) + gScoresession (qn-2, d)]
= g n-i
i=1
n
å Score(qi, d)
61Dynamic Information Retrieval Modeling Tutorial
2015 [Guan, Zhang and Yang SIGIR 2013]
Experiments
TREC 2011-2012 query sets, datasets
ClubWeb09 Category B
62Dynamic Information Retrieval Modeling Tutorial
2015
Search Accuracy (TREC
2012)
nDCG@10 (official metric used in TREC)
Approach nDCG@10 %chg MAP %chg
Lemur 0.2474 -21.54% 0.1274 -18.28%
TREC’12 median 0.2608 -17.29% 0.1440 -7.63%
Our TREC’12
submission0.3021 −4.19% 0.1490 -4.43%
TREC’12 best 0.3221 0.00% 0.1559 0.00%
QCM 0.3353 4.10%† 0.1529 -1.92%
QCM+Dup 0.3368 4.56%† 0.1537 -1.41%
63Dynamic Information Retrieval Modeling Tutorial
2015
Search Accuracy (TREC
2011)
nDCG@10 (official metric used in TREC)
Approach nDCG@10 %chg MAP %chg
Lemur 0.3378 -23.38% 0.1118 -25.86%
TREC’11 median 0.3544 -19.62% 0.1143 -24.20%
TREC’11 best 0.4409 0.00% 0.1508 0.00%
QCM 0.4728 7.24%† 0.1713 13.59%†
QCM+Dup 0.4821 9.34%† 0.1714 13.66%†
Our TREC’12
submission0.4836 9.68%† 0.1724 14.32%†
64Dynamic Information Retrieval Modeling Tutorial
2015
Search Accuracy for Different
Session Types TREC 2012 Sessions are classified into:
Product: Factual / Intellectual
Goal quality: Specific / Amorphous
Intellec
tual %chg Amorphous %chg Specific %chg Factual %chg
TREC best 0.3369 0.00% 0.3495 0.00% 0.3007 0.00% 0.3138 0.00%
Nugget 0.3305 -1.90% 0.3397 -2.80% 0.2736 -9.01% 0.2871 -8.51%
QCM 0.3870 14.87% 0.3689 5.55% 0.3091 2.79% 0.3066 -2.29%
QCM+DUP 0.3900 15.76% 0.3692 5.64% 0.3114 3.56% 0.3072 -2.10%
65
- Better handle sessions that demonstrate evolution and
exploration Because QCM treats a session as a continuous
process by studying changes among query transitions and
modeling the dynamicsDynamic Information Retrieval Modeling Tutorial
2015
POMDP Model
Dynamic Information Retrieval Modeling Tutorial
201566
……s0 s1
r0
a0
s2
r1
a1
s3
r2
a2
Hidden states
Observations
Belief
1R. D. Smallwood et. al., ‘73
o1 o2 o3
POMDP Definition
Dynamic Information Retrieval Modeling Tutorial
201567
A tuple (S, M, A, R, γ, O, Θ, B) S : state space M: transition matrix A: action space R: reward function γ: discount factor, 0< γ ≤1 O: observation set
an observation is a symbol emitted according to a hidden state. Θ: observation function
Θ(s,a,o) is the probability that o is observed when the system transitions into state s after taking action a, i.e. P(o|s,a). B: belief space
Belief is a probability distribution over hidden states.
68/33
A Markov Chain of Decision Making
…
A1A2 A3 A4
S1S2 S3 Sn
“old US coins” “collecting old
US coins”“selling old US
coins”
q1 q2 q3
“D1 is relevant and I
stay to find out more
about collecting…”
D1 D2 D3
“D2 is relevant and
I now move to the
next topic…”
“D3 is irrelevant; I slightly
edit the query and stay
here a little longer…”
[Luo, Zhang and Yang SIGIR 2014]
69/33
Hidden Decision Making States
SRT
Relevant &
Exploitation
SRR
Relevant &
Exploration
SNRT
Non-Relevant
& Exploitation
SNRR
Non-Relevant
& Exploration
scooter price ⟶ scooter stores
collecting old US coins⟶selling old US coins
Philadelphia NYC travel ⟶Philadelphia NYC train
Boston tourism ⟶ NYC tourism
q0
[Luo, Zhang and Yang SIGIR 2014]
70/33
Dual Agent Stochastic Game
Hidden states
Actions
Rewards
Markov
……s0
r0
a0
r1
a1
r2
a2
s1 s2 s3
Dual-agent game
Cooperative game
Joint optimization D2
User AgentSearch Engine
Agent[Luo, Zhang and Yang SIGIR 2014]
71/33
Actions User Action (Au)
add query terms (+Δq)
remove query terms (-Δq)
keep query terms (qtheme)
Search Engine Action(Ase)
Increase/ decrease/ keep term weights
Switch on or off a search technique,
e.g. to use or not to use query expansion
adjust parameters in search techniques
e.g., select the best k for the top k docs used in PRF
Message from the user(Σu)
clicked documents
SAT clicked documents
Message from search engine(Σse)
top k returned documents
Messages are essentially
documents that an agent
thinks are relevant.
[Luo, Zhang and Yang SIGIR 2014]
72/33
Dual-agent Stochastic Game
Documents
(world)
User agent Search engine agent
Belief
Updater
[Luo, Zhang and Yang SIGIR 2014]
Σse= 𝐷𝑡𝑜𝑝_𝑟𝑒𝑡𝑢𝑟𝑛𝑒𝑑
73/33
Dual-agent Stochastic Game
Documents
(world)
User agent
4 3
Search engine agent
Belief
Updater
[Luo, Zhang and Yang SIGIR 2014]
Σse= 𝐷𝑡𝑜𝑝_𝑟𝑒𝑡𝑢𝑟𝑛𝑒𝑑
74/33
Dual-agent Stochastic Game
Documents
(world)
User agent
4 3
[Luo, Zhang and Yang SIGIR 2014]
Belief
Updater
Search engine agent
Σse= 𝐷𝑡𝑜𝑝_𝑟𝑒𝑡𝑢𝑟𝑛𝑒𝑑
75/33
Observation function (O)
O(st+1, at, ωt) = P(ωt|st+1, at)
Two types of observations
Relevance related
Exploration-exploitation related
Probability of making observation ωt after taking action
at and landing in state st+1
[Luo, Zhang and Yang SIGIR 2014]
76/33
Relevance-related Observation
Intuition
Similarly, we have
As well as 76
st is likely to be
Relevant
Non-Relevant
If ∃d ∈ Dt-1 and d is SAT Clicked
otherwise
It happens after the user sends out the message 𝛴𝑢𝑡 (clicks)
𝑂(𝑠𝑡 = 𝑅𝑒𝑙, 𝑢 ,ωt=Rel)≝ 𝑃(ωt = 𝑅𝑒𝑙|𝑠𝑡 = 𝑅𝑒𝑙, 𝑢)
𝑂(𝑠𝑡 = 𝑅𝑒𝑙, 𝑢 ,ωt = 𝑅𝑒𝑙) ∝ 𝑃 𝑠𝑡 = 𝑅𝑒𝑙 ω𝑡 = 𝑅𝑒𝑙 𝑃(ωt = 𝑅𝑒𝑙, 𝑢)∝ 𝑃 𝑠𝑡 = 𝑅𝑒𝑙 ω𝑡 = 𝑅𝑒𝑙 𝑃(ωt = 𝑅𝑒𝑙| 𝑢)
𝑂 𝑠𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙, 𝑢 ,ωt = 𝑁𝑜𝑛𝑅𝑒𝑙∝ 𝑃 𝑠𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙 ω𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙 𝑃(ωt = 𝑁𝑜𝑛𝑅𝑒𝑙| 𝑢)
𝑂 𝑠𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙, 𝑢 ,ωt = 𝑅𝑒𝑙
𝑂 𝑠𝑡 = 𝑅𝑒𝑙, 𝑢 ,ωt = 𝑁𝑜𝑛𝑅𝑒𝑙
[Luo, Zhang and Yang SIGIR 2014]
77/33
It is a combined observation
It happens when updating the before-message-belief-state for a user action au(query change) and a search engine message Ʃse =Dt-1
Intuition
st is likely to be
Exploration
Exploitation
if (+Δqt≠∅ and +Δqt∉Dt-1) or (+Δqt=∅ and -Δqt≠∅ )
if (+Δqt≠∅ and +Δqt∈Dt-1) or (+Δqt=∅ and –Δqt=∅ )
EXPLORATION-RELATED OBSERVATION
𝑂 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛, 𝑎𝑢 = ∆𝑞𝑡, 𝑠𝑒 = 𝐷𝑡 − 1,ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛∝ 𝑃 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛 ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛× 𝑃 ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛 ∆𝑞𝑡, 𝐷𝑡 − 1
𝑂 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛, 𝑎𝑢 = ∆𝑞𝑡, 𝑠𝑒 = 𝐷𝑡 − 1,ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛∝ 𝑃 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛 𝑤𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛× 𝑃(𝑤𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛|∆𝑞𝑡, 𝐷𝑡 − 1)
[Luo, Zhang and Yang SIGIR 2014]
78/33
The belief state b is updated when a new observation is
obtained.
𝒃𝒕+𝟏(𝒔𝒋) = 𝑷(𝒔𝒋|𝝎𝒕, 𝒂𝒕, 𝒃𝒕
=
𝑷(𝝎𝒕|𝒔𝒋, 𝒂𝒕, 𝒃𝒕) 𝒔𝒊∈𝑺
𝑷(𝒔𝒋|𝒔𝒊, 𝒂𝒕, 𝒃𝒕)𝒃𝒕(𝒔𝒊
)𝑷(𝝎𝒕|𝒂𝒕, 𝒃𝒕
=
𝑶(𝒔𝒋, 𝒂𝒕, 𝝎𝒕) 𝒔𝒊∈𝑺
𝑷(𝒔𝒋|𝒔𝒊, 𝒂𝒕, 𝒃𝒕)𝒃𝒕(𝒔𝒊
)𝑷(𝝎𝒕|𝒂𝒕, 𝒃𝒕
BELIEF UPDATES (B)
79/33
The long term reward for the search engine agent
The long term reward for the user agent
Joint optimization
𝑸𝒔𝒆(𝒃, 𝒂) =
𝒔∈𝑺
)𝒃(𝒔)𝑹(𝒔, 𝒂 + 𝜸
𝝎∈𝜴
𝑷(𝝎|𝒃, 𝒂𝒖, 𝜮𝒔𝒆)𝑷(𝝎|𝒃, 𝜮𝒖)𝒎𝒂𝒙𝒂
𝑸𝒔𝒆(𝒃′, 𝒂
𝑸𝒖(𝒃, 𝒂𝒖) = 𝑹(𝒔, 𝒂𝒖) + 𝜸 𝒂𝒖
)𝑻(𝒔𝒕|𝒔𝒕−𝟏, 𝑫𝒕−𝟏 𝒎𝒂𝒙𝒔𝒕−𝟏𝑸𝒖(𝒔𝒕−𝟏, 𝒂𝒖)
= P(qt|d) +𝜸 𝒂𝒖
)𝐏(𝒒𝒕|𝒒𝒕−𝟏, 𝑫𝒕−𝟏, 𝒂 𝒎𝒂𝒙𝑫𝒕−𝟏𝑷 (𝒒𝒕−𝟏|𝑫𝒕−𝟏)
𝒂𝒔𝒆 = 𝒂𝒓𝒈𝒎𝒂𝒙𝒂
(𝑸𝒔𝒆(𝒃, 𝒂) + 𝑸𝒖(𝒃, 𝒂𝒖))
JOINT OPTIMIZATION — WIN-WIN
[Luo, Zhang and Yang SIGIR 2014]
Dynamic Search Engine Demo
http://dumplingproject.org
Dynamic Information Retrieval Modeling Tutorial
201580
81/33
EXPERIMENTS
Evaluate on TREC 2012 and 2013 Session Tracks
The session logs contain
session topic
user queries
previously retrieved URLs, snippets
user clicks, and dwell time etc.
Task: retrieve 2,000 documents for the last query in each session
The evaluation is based on the whole session.
A document related to any query in the session is a good document
81
Datasets
ClueWeb09
ClueWeb12
Spams, dups are
removed
82/33
ACTIONS
increasing weights of the added terms by a factor of x={1.05, 1.10, 1.15, 1.20, 1.25, 1.5, 1.75 or 2};
decreasing weights of the added terms by a factor of y={0.5, 0.57, 0.67, 0.8, 0.83, 0.87, 0.9 or 0.95};
Query Change Model (QCM) proposed in Guan et. al SIGIR’13;
Pseudo Relevance Feedback which assumes the top 20 retrieved documents are relevant;
directly uses the query in current iteration to perform retrieval;
combines all queries in a session weights them equally.82
a
Di
)D|(q P maxa) ,D ,q|(q P + d)|(q P = d) ,Score(q 1-i1-i1-i1-iiii1
83/33
SEARCH ACCURACY
Search accuracy on TREC 2012 Session Track
83
Win-win outperforms most retrieval algorithms on
TREC 2012.
84/33
84
Win-win outperforms all retrieval algorithms
on TREC 2013.
It is highly effective in Session Search.
Search accuracy on TREC 2013 Session Track
SEARCH ACCURACY
85/33
IMMEDIATE SEARCH ACCURACY
85
Original run: top returned documents provided by TREC log data
Win-win’s immediate search accuracy is better than the Original at
every iteration
Win-win's immediate search accuracy increases while the number
of search iterations increases
TREC 2012 Session Track TREC 2013 Session Track
86/33
86
q1=“best US destinations”
observation= NRRSRT
Relevant &
Exploitation
0.1784
SRRRelevant &
Exploration
0.1135
SNRTNon-Relevant &
Exploitation
0.2838
SNRRNon-Relevant
& Exploration
0.4243
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a
month and able to travel within a 150-mile radius of your destination. What are the
best cities to visit?
BELIEF UPDATES (B)
q0
87/33
87
q1=“best US destinations”
observation= NRR
q2=“distance New York
Boston”
observation = RT
SRTRelevant &
Exploitation
0.0005
SRRRelevant &
Exploration
0.0068
SNRTNon-Relevant &
Exploitation
0.0715
SNRRNon-Relevant
& Exploration
0.9212
BELIEF UPDATES (B)
q0
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a
month and able to travel within a 150-mile radius of your destination. What are the
best cities to visit?
88/33
88
q1=“best US destinations”
observation= NRR
q2=“distance New York
Boston”
observation = RT
SRTRelevant &
Exploitation
0.0005
SRRRelevant &
Exploration
0.0068
SNRTNon-Relevant &
Exploitation
0.0715
SNRRNon-Relevant
& Exploration
0.9212
BELIEF UPDATES (B)
q0
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a
month and able to travel within a 150-mile radius of your destination. What are the
best cities to visit?
89/33
89
q1=“best US destinations”
observation= NRR
q2=“distance New York
Boston”
observation = RT
q3=“maps.bing.com”
observation = NRT
SRTRelevant &
Exploitation
0.0151
SRRRelevant &
Exploration
0.4347
SNRTNon-Relevant &
Exploitation
0.0276
SNRRNon-Relevant
& Exploration
0.5226
BELIEF UPDATES (B)
q0
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a
month and able to travel within a 150-mile radius of your destination. What are the
best cities to visit?
90/33
90
q1=“best US destinations”
observation= NRR
q2=“distance New York
Boston”
observation = RT
q3=“maps.bing.com”
observation = NRT
SRTRelevant &
Exploitation
0.0151
SRRRelevant &
Exploration
0.4347
SNRTNon-Relevant &
Exploitation
0.0276
SNRRNon-Relevant
& Exploration
0.5226
BELIEF UPDATES (B)
q0
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a
month and able to travel within a 150-mile radius of your destination. What are the
best cities to visit?
91/33
91
q1=“best US destinations”
observation= NRR
q2=“distance New York
Boston”
observation = RT
q3=“maps.bing.com”
observation = NRT
SRTRelevant &
Exploitation
0.0291
SRRRelevant &
Exploration
0.7837
SNRTNon-Relevant &
Exploitation
0.0081
SNRRNon-Relevant
& Exploration
0.1790 q20=“Philadelphia NYC train”
observation = NRT
……
BELIEF UPDATES (B)
q0
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a
month and able to travel within a 150-mile radius of your destination. What are the
best cities to visit?
92/33
92
q1=“best US destinations”
observation= NRR
q2=“distance New York
Boston”
observation = RT
q3=“maps.bing.com”
observation = NRT
SRTRelevant &
Exploitation
0.0291
SRRRelevant &
Exploration
0.7837
SNRTNon-Relevant &
Exploitation
0.0081
SNRRNon-Relevant
& Exploration
0.1790 q20=“Philadelphia NYC train”
observation = NRT
……
BELIEF UPDATES (B)
q0
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a
month and able to travel within a 150-mile radius of your destination. What are the
best cities to visit?
93/33
93
q1=“best US destinations”
observation= NRR
q2=“distance New York
Boston”
observation = RT
q3=“maps.bing.com”
observation = NRT
SRTRelevant &
Exploitation
0.0304
SRRRelevant &
Exploration
0.8126
SNRTNon-Relevant &
Exploitation
0.0066
SNRRNon-Relevant
& Exploration
0.1505 q20=“Philadelphia NYC train”
observation = NRT
q21=“Philadelphia NYC bus”
observation = NRT
BELIEF UPDATES (B)
q0
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a
month and able to travel within a 150-mile radius of your destination. What are the
best cities to visit?
……
94/33
94
q1=“best US destinations”
observation= NRR
q2=“distance New York
Boston”
observation = RT
q3=“maps.bing.com”
observation = NRT
SRTRelevant &
Exploitation
0.0304
SRRRelevant &
Exploration
0.8126
SNRTNon-Relevant &
Exploitation
0.0066
SNRRNon-Relevant
& Exploration
0.1505 q20=“Philadelphia NYC train”
observation = NRT
q21=“Philadelphia NYC bus”
observation = NRT
BELIEF UPDATES (B)
q0
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a
month and able to travel within a 150-mile radius of your destination. What are the
best cities to visit?
……
Coffee Break
Dynamic Information Retrieval Modeling Tutorial
201595
Apply an MDP to an IR Problem
- Example
Dynamic Information Retrieval Modeling Tutorial
201596
User agent in session search
States – user’s relevance judgement
Action – new query
Reward – information gained
[Luo, Zhang, Yang SIGIR’14]
The agent uses a state estimator to update its belief about the hidden states
b′ = 𝑆𝐸(𝑏, 𝑎, 𝑜′)
b′ s′ = P s′ o′, a, b =𝑃(𝑠′,𝑜′|𝑎,𝑏)
P(𝑜′|𝑎,𝑏)
=Θ(𝑠′, 𝑎, 𝑜′) 𝑠𝑀(𝑠, 𝑎, 𝑠′)𝑏(𝑠)
𝑃(𝑜′|𝑎, 𝑏)
POMDP → Belief Update
Dynamic Information Retrieval Modeling Tutorial
201597
POMDP → Bellman Equation
Dynamic Information Retrieval Modeling Tutorial
201598
The Bellman equation for POMDP
𝑉 𝑏 = max𝑎
𝑟 𝑏, 𝑎 + 𝛾
𝑜′
𝑃(𝑜′|𝑎, 𝑏)𝑉(𝑏′)
A POMDP can be transformed into a continuous belief MDP (B, 𝑀′, A,
r, γ)
B : the continuous belief space
𝑀′: transition function 𝑀𝑎′ (𝑏, 𝑏′)= 𝑜∈𝑂 1𝑎,𝑜′(𝑏
′, 𝑏)Pr(𝑜′|𝑎, 𝑏)
where 1𝑎,𝑜′ 𝑏′, 𝑏 = 1, 𝑖𝑓 𝑆𝐸 𝑏, 𝑎, 𝑜′ = 𝑏′
0, 𝑒𝑙𝑠𝑒.
A: action space
r: reward function r(b, a)= 𝑠∈𝑆 𝑏 𝑠 𝑅(𝑠, 𝑎)
Applying POMDP to Dynamic
IR
Dynamic Information Retrieval Modeling Tutorial
201599
POMDP Dynamic IR
Environment Documents
Agents User, Search engine
States Queries, User’s decision making status, Relevance of
documents, etc
Actions Provide a ranking of documents, Weigh terms in the query,
Add/remove/unchange the query terms, Switch on or
switch off a search technology, Adjust parameters for a
search technology
Observations Queries, Clicks, Document lists, Snippets, Terms, etc
Rewards Evaluation measures (such as DCG, NDCG or MAP)
Clicking information
Transition matrix Given in advance or estimated from training data.
Observation
function
Problem dependent, Estimated based on sample datasets
Session Search Example - States
100
SRT
Relevant &
Exploitation
SRR
Relevant &
Exploration
SNRT
Non-Relevant &
Exploitation
SNRR
Non-Relevant &
Exploration
scooter price ⟶ scooter
stores
Hartford visitors ⟶ Hartford
Connecticut tourism
Philadelphia NYC travel ⟶ Philadelphia NYC train
distance New York Boston ⟶maps.bing.com
q0
[ J. Luo ,et al., ’14]Dynamic Information Retrieval Modeling Tutorial
2015
Session Search Example - Actions
(Au, Ase)
101
User Action(Au)
Add query terms (+Δq)
Remove query terms (-Δq)
keep query terms (qtheme)
clicked documents
SAT clicked documents
Search Engine Action(Ase)
increase/decrease/keep term weights,
Switch on or switch off query expansion
Adjust the number of top documents used in PRF
etc.
[ J. Luo et al., ’14]Dynamic Information Retrieval Modeling Tutorial
2015
TREC Session Tracks (2010-
2012)
Given a series of queries {q1,q2,…,qn}, top 10
retrieval results {D1, … Di-1 } for q1 to qi-1, and
click information
The task is to retrieve a list of documents for the
current/last query, qn
Relevance judgment is made based on how
relevant the documents are for qn, and how relevant
they are for information needs for the entire session
(in topic description)
no need to segment the sessions
102Dynamic Information Retrieval Modeling Tutorial
2015
Query change is an important
form of feedback
We define query change as the syntactic
editing changes between two adjacent queries:
includes
, added terms
, removed terms
The unchanged/shared terms are called:
, theme term
1 iii qqq
iq
103
iqiq
iq
themeqq1 = “bollywood
legislation”
q2 = “bollywood law”
-------------------------------------
--
Theme Term =
“bollywood”
Added (+Δq) = “law”
Dynamic Information Retrieval Modeling Tutorial
2015
Where do these query changes come
from?
Given TREC Session settings, we consider two
sources of query change:
the previous search results that a user
viewed/read/examined
the information need
Example:
Kurosawa Kurosawa wife
`wife’ is not in any previous results, but in the topic
description
However, knowing information needs before
search is difficult to achieve
104Dynamic Information Retrieval Modeling Tutorial
2015
Previous search results could
influence query change in quite
complex ways
Merck lobbyists Merck lobbying US policy
D1 contains several mentions of ‘policy’, such as “A lobbyist who until 2004 worked as senior policy
advisor to Canadian Prime Minister Stephen Harper was hired last month by Merck …”
These mentions are about Canadian policies; while the user adds US policy in q2
Our guess is that the user might be inspired by ‘policy’, but he/she prefers a different sub-concept other than `Canadian policy’
Therefore, for the added terms `US policy’, ‘US’ is the novel term here, and ‘policy’ is not since it appeared in D1. The two terms should be treated differently
105Dynamic Information Retrieval Modeling Tutorial
2015
106/33
POMDP
Rich Interactions
Hidden, Evolving
Information Needs
A Long Term
Goal
Temporal
Dependency
actions
hidden states
rewards
Markov
property
POMDP
(Partially Observable
Markov Decision
Process)
SG (Stochastic Games)
Multi-agent
Collaboration
Recap – Characteristics of
Dynamic IR
Dynamic Information Retrieval Modeling Tutorial
2015107
Rich interactions
Query formulation, Document clicks, Document
examination, eye movement, mouse movements, etc.
Temporal dependency
Overall goal
Modeling Query Change
A framework that is inspired by Reinforcement
Learning
Reinforcement Learning for Markov Decision
Process
models a state space S and an action space A
according to a transition model T = P(si+1|si ,ai)
a policy π(s) = a indicates that at a state s, what are
the actions a can be taken by the agent
each state is associated with a reward function R
that indicates possible positive reward or negative
loss that a state and an action may result.
Reinforcement learning offers general solutions to
MDP and seeks for the best policy for an agent.108
Outline
Dynamic Information Retrieval Modeling Tutorial
2015109
Introduction & Theory
Session Search
Dynamic RankingMulti Armed Bandits
Portfolio Ranking
Multi-Page Search
Recommendation and Advertising
Guest Talk: Charlie Clarke
Discussion Panel
Dynamic Information Retrieval Modeling Tutorial
2015110
Markov Process
Hidden Markov Model
Markov Decision Process
Partially Observable Markov Decision Process
Multi-Armed Bandit
Family of Markov Models
Multi Armed Bandits (MAB)
Dynamic Information Retrieval Modeling Tutorial
2015111
……
……
Which slot
machine
should I select
in this round?
Reward
Multi Armed Bandits (MAB)
Dynamic Information Retrieval Modeling Tutorial
2015112
I won! Is this
the best slot
machine?
Reward
MAB Definition
Dynamic Information Retrieval Modeling Tutorial
2015113
A tuple (S, A, R, B)
S : hidden reward distribution of each
bandit
A: choose which bandit to play
R: reward for playing bandit
B: belief space, our estimate of each
bandit’s distribution
Comparison with Markov Models
Dynamic Information Retrieval Modeling Tutorial
2015114
Single state Markov Decision Process
No transition probability
Similar to POMDP in that we maintain a
belief state
Action = choose a bandit, does not
affect state
Does not ‘plan ahead’ but intelligently
adapts
Somewhere between interactive and
dynamic IR
MAB Policy Reward
Dynamic Information Retrieval Modeling Tutorial
2015115
MAB algorithm describes a policy 𝜋 for
choosing bandits
Maximise rewards from chosen bandits
over all time steps
Minimize regret
𝑡=1𝑇 𝑅𝑒𝑤𝑎𝑟𝑑 𝑎∗ − 𝑅𝑒𝑤𝑎𝑟𝑑(𝑎𝜋(𝑡))
Cumulative difference between optimal
reward and actual reward
Exploration vs Exploitation
Dynamic Information Retrieval Modeling Tutorial
2015116
Exploration
Try out bandits to find which has highest average
reward
Exploitation
Too much exploration leads to poor performance
Play bandits that are known to pay out higher
reward on average
MAB algorithms balance exploration and
exploitation
Start by exploring more to find best bandits
Exploit more as best bandits become known
MAB – Index Algorithms
Dynamic Information Retrieval Modeling Tutorial
2015117
Gittens index1
Play bandit with highest ‘Dynamic Allocation Index’
Modelled using MDP but suffers ‘curse of
dimensionality’
𝜖-greedy2
Play highest reward bandit with probability 1 − ϵ
Play random bandit with probability 𝜖
UCB (Upper Confidence Bound)3
1J. C. Gittins. ‘892Nicolò Cesa-Bianchi et. al.,
‘983P. Auer et. al., ‘02
Comparison of Markov
Models
Dynamic Information Retrieval Modeling Tutorial
2015118
Markov Process – a fully observable stochastic
process
Hidden Markov Model – a partially observable
stochastic process
MDP – a fully observable decision process
MAB – a decision process, either fully or partially
observable
POMDP – a partially observable decision process
actions rewards states
Markov Process No No Observable
Hidden Markov
Model
No No Unobservable
MDP Yes Yes Observable
POMDP Yes Yes Unobservable
MAB Yes Yes Fixed
Outline
Dynamic Information Retrieval Modeling Tutorial
2015119
Introduction & Theory
Session Search
Dynamic RankingMulti Armed Bandits
Portfolio Ranking
Multi-Page Search
Recommendation and Advertising
Guest Talk: Charlie Clarke
Discussion Panel
UCB Algorithm
Dynamic Information Retrieval Modeling Tutorial
2015120
𝑥𝑖 +2 ln 𝑡
𝑇𝑖
UCB Algorithm
Dynamic Information Retrieval Modeling Tutorial
2015121
𝑥𝑖 +2 ln 𝑡
𝑇𝑖
Calculate for all 𝑖 and select highest
UCB Algorithm
Dynamic Information Retrieval Modeling Tutorial
2015122
𝑥𝑖 +2 ln 𝑡
𝑇𝑖
Calculate for all 𝑖 and select highest
Average reward 𝑥𝑖
UCB Algorithm
Dynamic Information Retrieval Modeling Tutorial
2015123
𝑥𝑖 +2 ln 𝑡
𝑇𝑖
Calculate for all 𝑖 and select highest
Average reward 𝑥𝑖 Time step 𝑡
UCB Algorithm
Dynamic Information Retrieval Modeling Tutorial
2015124
𝑥𝑖 +2 ln 𝑡
𝑇𝑖
Calculate for all 𝑖 and select highest
Average reward 𝑥𝑖 Time step 𝑡
Number of times bandit 𝑖 has been played 𝑇𝑖
UCB Algorithm
Dynamic Information Retrieval Modeling Tutorial
2015125
𝑥𝑖 +2 ln 𝑡
𝑇𝑖
Calculate for all 𝑖 and select highest
Average reward 𝑥𝑖 Time step 𝑡
Number of times bandit 𝑖 has been played 𝑇𝑖 Chances of playing infrequently played bandits
increases over time
Iterative Expectation
Dynamic Information Retrieval Modeling Tutorial
2015126
𝑥𝑖 +2 ln 𝑡
𝑇𝑖
M. Sloan and J. Wang ‘13
UCB Algorithm
Dynamic Information Retrieval Modeling Tutorial
2015127
𝑥𝑖 +2 ln 𝑡
𝑇𝑖
Documents 𝑖
M. Sloan and J. Wang ‘13
Iterative Expectation
Dynamic Information Retrieval Modeling Tutorial
2015128
𝑟𝑖 +2 ln 𝑡
𝑇𝑖
Documents 𝑖
Average probability of relevance 𝑟𝑖
M. Sloan and J. Wang ‘13
Iterative Expectation
Dynamic Information Retrieval Modeling Tutorial
2015129
𝑟𝑖 +2 ln 𝑡
𝛾𝑖(𝑡)
Documents 𝑖
Average probability of relevance 𝑟𝑖 ‘Effective’ number of impressions
𝛾𝑖 𝑡 = 𝑘=1𝑡 𝛼
𝐶𝑘𝛽1−𝐶𝑘
𝛼 and 𝛽 reward clicks and non-clicks depending on
rank
M. Sloan and J. Wang ‘13
Iterative Expectation
Dynamic Information Retrieval Modeling Tutorial
2015130
𝑟𝑖 + 𝜆2 ln 𝑡
𝛾𝑖(𝑡)
Documents 𝑖
Average probability of relevance 𝑟𝑖 ‘Effective’ number of impressions
𝛾𝑖 𝑡 = 𝑘=1𝑡 𝛼
𝐶𝑘𝛽1−𝐶𝑘
𝛼 and 𝛽 reward clicks and non-clicks depending on
rank
Exploration parameter 𝜆
M. Sloan and J. Wang ‘13
Portfolio Theory of IR
Dynamic Information Retrieval Modeling Tutorial
2015131
Portfolio Theory maximises expected return for a
given amount of risk1
Diversity of portfolio increases likely return
We can consider documents as ‘shares’
Documents are dependent on one another, unlike
PRP
Portfolio Theory of IR2 allows us to introduce diversity
1H. Markowitz. ‘522J. Wang et. al. ‘09
Portfolio Ranking
Dynamic Information Retrieval Modeling Tutorial
2015132
Documents are dependent on each other
Co-click Matrix from users and logs1
Portfolio Armed Bandit Ranking2:
Exploratively rank using Iterative Expectation
Diversify using portfolio optimisation over co-click matrix
Update relevance and dependence with each click
Both explorative and diverse
1W. Wu et al. ‘112M. Sloan and Jun Wang‘12
Outline
Dynamic Information Retrieval Modeling Tutorial
2015133
Introduction & Theory
Session Search
Dynamic RankingMulti Armed Bandits
Portfolio Ranking
Multi-Page Search
Recommendation and Advertising
Guest Talk: Charlie Clarke
Discussion Panel
Multi Page Search
Dynamic Information Retrieval Modeling Tutorial
2015134
Page 1 Page 2
2.
1.
2.
1.
X Jin, M. Sloan and J. Wang
’13
Multi Page Search Example -
States & Actions
Dynamic Information Retrieval Modeling Tutorial
2015135
State:
Relevanc
e of
docume
nt
Action:
Ranking
of
document
s
Observatio
n: Clicks Belief:
Multivariate
Guassian
Reward: DCG
over 2 pages
X Jin, M. Sloan and J. Wang
’13
Model
Dynamic Information Retrieval Modeling Tutorial
2015136
Model
Dynamic Information Retrieval Modeling Tutorial
2015137
𝑁 𝜃1, Σ1
𝜃1 -prior estimate of relevance
Σ1 - prior estimate of covariance
Document similarity
Topic Clustering
Model
Dynamic Information Retrieval Modeling Tutorial
2015138
Rank action for page 1
Model
Dynamic Information Retrieval Modeling Tutorial
2015139
Model
Dynamic Information Retrieval Modeling Tutorial
2015140
Feedback from page 1
𝒓 ~ 𝑁(𝜃𝒔1, Σ𝒔
1)
Model
Dynamic Information Retrieval Modeling Tutorial
2015141
Update estimates using 𝒓1
𝜃1 =𝜃\𝒔′𝜃𝒔′
Σ1 =Σ\𝒔′ Σ\s′𝒔′Σs′\𝒔′ Σ𝒔′
𝜃2 = 𝜃\𝒔′ + Σ\s′𝒔′Σ𝒔′−1(𝒓1 − 𝜃𝒔′)
Σ2 = Σ\𝒔′ - Σ\s′𝒔′Σ𝒔′−1Σs′\𝒔′
Model
Dynamic Information Retrieval Modeling Tutorial
2015142
Rank using PRP
Model
Dynamic Information Retrieval Modeling Tutorial
2015143
Utility or Ranking
𝜆 𝑗=1𝑀
𝜃𝑠𝑗1
log2(𝑗+1)+ 1 − 𝜆 𝑗=1+𝑀
2𝑀𝜃𝑠𝑗2
log2(𝑗+1)
DCG
Model – Bellman Equation
Dynamic Information Retrieval Modeling Tutorial
2015144
Optimize 𝒔1 to improve 𝑼𝒔2
𝑉 𝜃1, Σ1, 1 = max𝒔1
𝜆𝜃𝒔1.𝑾1 +
𝜆
Dynamic Information Retrieval Modeling Tutorial
2015145
Balances exploration and exploitation in page 1
Tuned for different queries
Navigational
Informational
𝜆 = 1 for non-ambiguous search
Approximation
Dynamic Information Retrieval Modeling Tutorial
2015146
Monte Carlo Sampling
≈ max𝒔1
𝜆𝜃𝒔1.𝑾1 +max
𝒔21 − 𝜆
1
𝑆 𝑟∈𝑂 𝜃𝒔
2.𝑾2𝑃 𝒓
Sequential Ranking Decision
Experiment Data
Dynamic Information Retrieval Modeling Tutorial
2015147
Difficult to evaluate without access to live users
Simulated using 3 TREC collections and
relevance judgements
WT10G – Explicit Ratings
TREC8 – Clickthroughs
Robust – Difficult (ambiguous) search
User Simulation
Dynamic Information Retrieval Modeling Tutorial
2015148
Rank M documents
Simulated user clicks according to relevance
judgements
Update page 2 ranking
Measure at page 1 and 2
Recall
Precision
nDCG
MRR
BM25 – prior ranking model
Investigating λ
Dynamic Information Retrieval Modeling Tutorial
2015149
Baselines
Dynamic Information Retrieval Modeling Tutorial
2015150
𝜆 determined experimentally
BM25
BM25 with conditional update (𝜆 = 1)
Maximum Marginal Relevance (MMR)
Diversification
MMR with conditional update
Rocchio
Relevance Feedback
Results
Dynamic Information Retrieval Modeling Tutorial
2015151
Results
Dynamic Information Retrieval Modeling Tutorial
2015152
Results
Dynamic Information Retrieval Modeling Tutorial
2015153
Results
Dynamic Information Retrieval Modeling Tutorial
2015154
Results
Dynamic Information Retrieval Modeling Tutorial
2015155
Outline
Dynamic Information Retrieval Modeling Tutorial
2015156
Introduction & Theory
Session Search
Dynamic Ranking
Recommendation and Advertising
Guest Talk: Charlie Clarke
Discussion Panel
Cold-start problem in recommmender systems
Interactive Recommender Systems
Possible Solutions
Zhao, Xiaoxue, Weinan Zhang, and Jun
Wang. "Interactive collaborative filtering."
CIKM, 2013.
Objective
Cold-start problem Interactive
mechanism for CF
Zhao, Xiaoxue, Weinan Zhang, and Jun
Wang. "Interactive collaborative filtering."
CIKM, 2013.
Proposed EE algorithms
Thompson Sampling
Linear-UCB
General Linear-UCB
Zhao, Xiaoxue, Weinan Zhang, and Jun Wang. "Interactive collaborative filtering." CIKM,
2013.
Cold-start users
Zhao, Xiaoxue, Weinan Zhang, and Jun Wang. "Interactive collaborative filtering." CIKM,
2013.
Ad selection problem
Dynamic Information Retrieval Modeling Tutorial
2015163
how online publishers could optimally select ads
to maximize their ad incomes over time?
Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM
2012
Selling in
multiple-
channels
with non-
fixed
prices
Dynamic Information Retrieval Modeling Tutorial
2015164
Problem formulation
Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM
2012
Problem formulation
Dynamic Information Retrieval Modeling Tutorial
2015165
Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM
2012
Objective function
Dynamic Information Retrieval Modeling Tutorial
2015166
Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM
2012
Belief update
Dynamic Information Retrieval Modeling Tutorial
2015167
Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM
2012
Results
Dynamic Information Retrieval Modeling Tutorial
2015168
Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM
2012
Outline
Dynamic Information Retrieval Modeling Tutorial
2015169
Introduction & Theory
Session Search
Dynamic Ranking
Recommendation and Advertising
Guest Talk: Charlie Clarke
Discussion Panel
Dynamic Information Retrieval EvaluationGuest talk at the WSDM 2015 tutorial on
Dynamic Information Retrieval Modeling
Charlie Clarke
(with much much input from Mark Smucker)
University of Waterloo, Canada
Moving from static ranking to dynamic domains
• How to extend IR evaluation methodologies to
dynamic domains?
• Three key ideas:
1. Realistic models of searcher interactions
2. Measures costs to searcher in meaningful units
(e.g., time, money, …)
3. Measure benefits to searcher in meaningful units
(e.g, time, nuggets, …)
Charles Clarke, University of Waterloo 171
This talk strongly reflects my opinions (not trying to be neutral).
But I am the guest speaker
Evaluating Information Access Systems
Charles Clarke, University of Waterloo 172
searching, browsing, summarization,
visualization, desktop, mobile, web,
books, images, questions, etc., and
combinations of these
Does the system work for its users?
Will this change make the system better or worse?
How do we quantify performance?
Performance 101: Is this a good search result?
Charles Clarke, University of Waterloo 173
How to evaluate?
Study users
Charles Clarke, University of Waterloo 174
Users in the wild:
• A/B Testing
• Result interleaving
• Clicks and dwell time
• Mouse movements
• Other implicit feedback
• …
Users in the lab:
• Time to task completion
• Think aloud protocols
• Questionnaires
• Eye tracking
• …
Unfortunately user studies are
• Slow
• Expensive
• Conditions can never be exactly duplicated
(e.g., learning to rank)
Charles Clarke, University of Waterloo 175
Alternative: User performance prediction
Can we predict the impact of a proposed change to an
information access system (while respecting and reflecting
differences between users)?
Can we quantify performance improvements in meaningful
units so that effect sizes can be considered in statistical
testing? Are improvements practically significant, as well as
statistically significant?
Want to predict the impact of a proposed change
automatically, based on existing user performance data,
rather than gathering new performance data.
Charles Clarke, University of Waterloo 176
The BIG goal
↵
Traditional Evaluation of Rankers
• Test collection:
– Documents
– Queries
– Relevance judgments
• Each ranker generates a ranked list of
documents for each query
• Score ranked lists using relevance judgments
and standard metrics (recall, mean average
precision, nDCG, ERR, RBP, ….).
Charles Clarke, University of Waterloo 177
Charles Clarke, University of Waterloo 178
Example of a good-old-fashioned IR Metric
Relevant2.
Non-relevant1.
Non-relevant3.
Relevant5.
Non-relevant4.
Non-relevant6.
Non-relevant7.
Ranked List of
Documents
8.
…
Precision at
Rank N
0.00
0.50
0.33
0.25
0.40
0.33
0.29…
Average Precision is
the average of the
precision at N for each
relevant document.
Mean average
precision (MAP) is AP
averaged over the set
of queries.
AP =1
RPrec(Ri )
Ri
å
Precision at rank N is the fraction
of documents that are relevant in
the first N documents.
General form of effectiveness measures
Nearly all standard effectiveness measures
have the same basic form (including nDCG,
RBP, ERR, average precision,…):
Charles Clarke, University of Waterloo 179
Normalization
Rank Gain at rank k
Discount
factor
Implicit user model…
• User works down the ranked list spending
equal time on each document. Captions,
navigation, etc., have no impact.
• If they make it to rank i, they receive some
benefit (i.e., gain).
• Eventually they stop, which is reflected in the
discount (i.e., they are less likely to reach
lower ranks).
• Normalization typically maps the score into
the range [0:1]. Units may not be meaningful.
Charles Clarke, University of Waterloo 180
Traditional Evaluation of Rankers
• Many effectiveness measures: precision,
recall, average precision, rank-biased
precision, discounted cumulative gain, etc.
• Widely used and accepted as standard
practice.
• But…• What does an improvement in average precision from
0.28 to 0.31 mean to users?
• Does an increase in the measure really translate to an
improved user experience?
• How will an improve in the performance of a single
component impact overall system performance?
Charles Clarke, University of Waterloo 181
How to better reflect user variation and system performance?
Charles Clarke, University of Waterloo 182
Example: What’s the simplest possible user interface for search?
1) User issues a query
2) System returns material to read
i.e., system returns stuff to read, in order
(not a list of documents; more like a newspaper article)
A correspondingly simple user model, has two parameters:
1) Reading speed
2) Time spent reading
Reading speed distribution (from users in the lab)
Charles Clarke, University of Waterloo 183
Empirical distribution of reading speed during an information access task,
and its fit to a log-normal distribution.
Stopping time distribution (from users in the wild)
Charles Clarke, University of Waterloo 184
Empirical distribution of time spent searching during an information access
task, and its fit to a log-normal distribution.
Evaluating a search result
Charles Clarke, University of Waterloo 185
1) Generate a reading speed from the distribution
2) Generate a stopping time from the distribution
3) How much useful material did the user read?
4) Repeat for many (simulated) users
As an example, we use passage retrieval runs from TREC 2006
Hard Track, which essentially assume our simple user interface.
We measure costs to searcher in terms of time spent searching.
We measure benefits to searcher in terms of “time well spent”.
Useful characters read vs. Characters read
Charles Clarke, University of Waterloo 186
Performance of run york04ha1 on TREC 2004 HARD Track topic 424
(“Bollywood”) with 10,000 simulated users.
Useful characters read vs. Time spent reading
Charles Clarke, University of Waterloo 187
Performance of run york04ha1 on TREC 2004 HARD Track topic 424
(“Bollywood”) with 10,000 simulated users.
Time well spent vs. Time spent reading
Charles Clarke, University of Waterloo 188
Performance of run york04ha1 on TREC 2004 HARD Track topic 424
(“Bollywood”) with 10,000 simulated users.
Distribution of time well spent
Charles Clarke, University of Waterloo 189
Performance of run york04ha1 on TREC 2004 HARD Track topic 424
(“Bollywood”) with 10,000 simulated users.
Temporal precision vs. Time spent Reading
Charles Clarke, University of Waterloo 190
Performance of run york04ha1 on TREC 2004 HARD Track topic 424
(“Bollywood”) with 10,000 simulated users.
Distribution of temporal precision
Charles Clarke, University of Waterloo 191
Performance of run york04ha1 on TREC 2004 HARD Track topic 424
(“Bollywood”) with 10,000 simulated users.
General Framework (Part I): Cumulative Gain
• Consider the performance of a system in terms
of a cost-benefit (cumulative gain) curve G(t).
– Measure costs (e.g., in terms of time spent).
– Measure benefits (e.g., in terms of time well
spent).
• A particular instance of G(t) represents a
single user (described by a set of parameters)
interacting with a system. not just a list!!!
• G(t) captures factors intrinsic to the system.
We don’t know how much time the user has to
invest, but for different levels of investment,
G(t) indicates the benefit.Charles Clarke, University of Waterloo 192
General Framework (Part II): Decay
• Consider the user’s willingness to invest time in
terms of a decay curve D(t), which provides a
survival probability.
• We assume that G(t) and D(t) are independent.
(System dependent stopping probabilities are
accommodated in G(t). Details on request.)
• D(t) captures factors extrinsic to the system.
The user only has so much time they could
invest. The cannot invest more, even if they
would receive substantial additional benefit
from further interaction.
Charles Clarke, University of Waterloo 193
General form of effectiveness measures (REMINDER)
Nearly all standard effectiveness measures
have the same basic form (including nDCG,
RBP, ERR, average precision,…):
Charles Clarke, University of Waterloo 194
Normalization
Rank Gain at rank k
Discount
factor
General Framework (Part III): Time-biased gain
Overall system performance may be expressed
as expected cumulative gain (which also
incorporates standard effectiveness measures):
Charles Clarke, University of Waterloo 195
Normalization (== 1?)
Time Gain at time t
Decay
factor
General Framework (Part IV): Multiple users
• Cumulative gain may be computed by
– Simulation (drawing a set of parameters from a
population of users).
– Measuring actual interaction on live systems.
– Combinations of measurement and simulation.
• Simulating and/or measuring multiple users
allows us to consider performance difference
across the population of users.
• Simulation provides matching pairs (the same
user on both systems) increasing our ability to
detect differences.
Charles Clarke, University of Waterloo 196
General Framework
Most of the evaluation proposals in the
references can be reformulated in terms of this
general framework, including those that
address issues of:
– Novelty and diversity
– Filtering, summarization, question answering
– Session search, etc.
Charles Clarke, University of Waterloo 197
One more example from our current research…
Session search example
• Two (or more) result lists, e.g., from query
reformulation, query suggestion, or switching
search engines.
• Modeling searcher interaction requires a
switch from one result to another.
• The optimal time to switch depends on the
total time available to search.
For example (with many details omitted…):
Charles Clarke, University of Waterloo 198
Simulation of searchers switching between lists: A vs. B
Charles Clarke, University of Waterloo 199
User starts on list A.
If the user has less
than five minutes to
search, they should
stay on list A.
If the user has more
than five minutes to
search, they should
leave list A after 90
seconds.
But can we assume
optimal behavior when
modeling users?
Simulation of searchers switching between lists: A vs. B
Charles Clarke, University of Waterloo 200
0 2 4 6 8 10
02
46
8
Switch Time (minutes)
Ave
rag
e G
ain
(re
leva
nt d
ocu
me
nts
)
10 minutes
8 minutes
6 minutes
4 minutes
2 minutes
Session Duration
Topic = 389, List A = sab05ror1, List B = uic0501
Different view of the
same simulation, with
thousands of simulated
users.
Here, benefits are
measured by number of
relevant documents
seen.
Optimal switching time
depends on session
duration.
Summary
• Primary goal of IR evaluation: Predict how changes
to an IR system will impact the user experience.
• Evaluation in dynamic domains requires us to
explicitly model the system interface and the user’s
search behavior. Costs and benefits must be
measured in meaningful units (e.g., time).
• Successful IR evaluation requires measurement of
users, both “in the wild” and in the lab. These
measurements calibrate models, which make
predictions, which improve systems.
Charles Clarke, University of Waterloo 201
A few key papers
• Leif Azzopardi. 2009. Usage based effectiveness measures: monitoring application
performance in information retrieval. In Proceedings of the 18th ACM conference on
Information and knowledge management (CIKM '09).
• Leif Azzopardi, Diane Kelly, and Kathy Brennan. 2013. How query cost affects search
behavior. In Proceedings of the 36th international ACM SIGIR conference on Research and
development in information retrieval (SIGIR '13).
• Feza Baskaya, Heikki Keskustalo, and Kalervo Järvelin. 2012. Time drives interaction:
simulating sessions in diverse searching environments. In Proceedings of the 35th
international ACM SIGIR conference on research and development in information retrieval
(SIGIR '12).
• Ben Carterette. 2011. System effectiveness, user models, and user utility: a conceptual
framework for investigation. In Proceedings of the 34th international ACM SIGIR
conference on research and development in Information Retrieval (SIGIR '11).
• Ben Carterette, Evangelos Kanoulas, and Emine Yilmaz. 2011. Simulating simple user
behavior for system effectiveness evaluation. In Proceedings of the 20th ACM international
conference on information and knowledge management (CIKM '11).
• Ben Carterette, Evangelos Kanoulas, and Emine Yilmaz. 2012. Incorporating variability in
user behavior into systems based evaluation. In Proceedings of the 21st ACM international
conference on information and knowledge management (CIKM '12).
Charles Clarke, University of Waterloo 202
A few more key papers
• Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected
reciprocal rank for graded relevance. In Proceedings of the 18th ACM conference on
information and knowledge management (CIKM '09).
• Charles L.A. Clarke, Nick Craswell, Ian Soboroff, and Azin Ashkan. 2011. A comparative
analysis of cascade measures for novelty and diversity. In Proceedings of the fourth ACM
international conference on web search and data mining (WSDM '11).
• Charles L. A. Clarke and Mark D. Smucker. 2014. Time well spent. In Proceedings of the
5th information interaction in context symposium (IIiX '14).
• Georges Dupret and Mounia Lalmas. 2013. Absence time and user engagement:
evaluating ranking functions. In Proceedings of the sixth ACM international conference on
web search and data mining (WSDM '13).
• Kalervo Järvelin, Susan L. Price, Lois M. L. Delcambre, and Marianne Lykke Nielsen. 2008.
Discounted cumulated gain based evaluation of multiple-query IR sessions. In Proceedings
of the IR research, 30th European conference on Advances in information retrieval
(ECIR'08).
• Jiyun Luo, Christopher Wing, Hui Yang, and Marti Hearst. 2013. The water filling model and
the cube test: multi-dimensional evaluation for professional search. In Proceedings of the
22nd ACM international conference on information & knowledge management (CIKM '13).
Charles Clarke, University of Waterloo 203
And yet more key papers
• Tetsuya Sakai and Zhicheng Dou. 2013. Summaries, ranked retrieval and sessions: a
unified framework for information access evaluation. In Proceedings of the 36th
international ACM SIGIR conference on Research and development in information retrieval
(SIGIR '13).
• Mark D. Smucker and Charles L.A. Clarke. 2012. Time-based calibration of effectiveness
measures. In Proceedings of the 35th international ACM SIGIR conference on Research
and development in information retrieval (SIGIR '12).
• Mark D. Smucker and Charles L. A. Clarke. 2012. Modeling user variance in time-biased
gain. In Proceedings of the Symposium on Human-Computer Interaction and Information
Retrieval (HCIR '12).
• Emine Yilmaz, Milad Shokouhi, Nick Craswell, and Stephen Robertson. 2010. Expected
browsing utility for web search evaluation. In Proceedings of the 19th ACM international
conference on Information and knowledge management (CIKM '10).
• Yiming Yang and Abhimanyu Lad. 2009. Modeling expected utility of multi-session
information distillation. In Proceedings of the 2nd international conference on the theory of
information retrieval (ICTIR ’09).
• Plus many other (ask me).
Charles Clarke, University of Waterloo 204
Dynamic Information Retrieval EvaluationGuest talk at the WSDM 2015 tutorial on
Dynamic Information Retrieval Modeling
Charlie Clarke
University of Waterloo, Canada
Thank you!
Outline
Dynamic Information Retrieval Modeling Tutorial
2015206
Introduction & Theory
Session Search
Dynamic Ranking
Recommendation and Advertising
Guest Talk: Charlie Clarke
Discussion Panel
Apply an MDP to an IR
Problem
Dynamic Information Retrieval Modeling Tutorial
2015207
We can model IR systems using a Markov
Decision Process
Is there a temporal component?
States – What changes with each time step?
Actions – How does your system change the
state?
Rewards – How do you measure feedback or
effectiveness in your problem at each time
step?
Transition Probability – Can you determine
this?
If not, then model free approach is more
Apply an MDP to an IR Problem
- Example
Dynamic Information Retrieval Modeling Tutorial
2015208
User agent in session search
States – user’s relevance judgement
Action – new query
Reward – information gained
[Luo, Zhang, Yang SIGIR’14]
Apply an MDP to an IR Problem
- Example
Dynamic Information Retrieval Modeling Tutorial
2015209
Search engine’s perspective
What if we can’t directly observe user’s
relevance judgement?
Click ≠ relevance
? ? ? ?
Applying POMDP to Dynamic
IR
Dynamic Information Retrieval Modeling Tutorial
2015210
POMDP Dynamic IR
Environment Documents
Agents User, Search engine
States Queries, User’s decision making status, Relevance of
documents, etc
Actions Provide a ranking of documents, Weigh terms in the query,
Add/remove/unchange the query terms, Switch on or
switch off a search technology, Adjust parameters for a
search technology
Observations Queries, Clicks, Document lists, Snippets, Terms, etc
Rewards Evaluation measures (such as DCG, NDCG or MAP)
Clicking information
Transition matrix Given in advance or estimated from training data.
Observation
function
Problem dependent, Estimated based on sample datasets
SIGIR Tutorial July 7th 2014
Grace Hui Yang
Marc Sloan
Jun Wang
Guest Speaker: Emine Yilmaz
Dynamic Information Retrieval
Modeling
Panel
Discussion
Outline
Dynamic Information Retrieval Modeling Tutorial
2015212
Introduction & Theory
Session Search
Dynamic Ranking
Recommendation and Advertising
Guest Talk: Charlie Clarke
Discussion Panel
Conclusion
Conclusions
Dynamic Information Retrieval Modeling Tutorial
2015213
Dynamic IR describes a new class of interactive
model
Incorporates rich feedback, temporal dependency
and is goal oriented.
Family of Markov models and Multi Armed Bandit
theory useful in building DIR models
Applicable to a range of IR problems
Useful in applications such as session search and
evaluation
Dynamic IR Book
Dynamic Information Retrieval Modeling Tutorial
2015214
Published by Morgan & Claypool
‘Synthesis Lectures on Information Concepts,
Retrieval, and Services’
Due April / May 2015 (in time for SIGIR 2015)
TREC 2015
Dynamic Domain Track Co-organized by Grace Hui Yang, John Frank, Ian Soboroff
Underexplored subsets of Web content Limited scope and richness of indexed content, which may not
include relevant components of the deep web
temporary pages, pages behind forms, etc.
Basic search interfaces, where there is little collaboration or history beyond independent keyword search Complex, task-based, dynamic search Temporal dependency Rich interactions Complex, evolving information needs Professional users A wide range of search strategies
215
Task
An interactive, multiple runs of search
Starting point: System is given a search query
Iterate System returns a ranked list of 5 documents API returns relevance judgments go to next iteration of retrieval
until done (system decides when to stop)
The goal of the system is to find relevant information for each topic as soon as possible
One-shot ad-hoc search is included
If system decides to stop after iteration one
216
domains
Domain Corpus
Illicit goods 30k forum posts from 5-10 forums (total ~300k posts)
Which users are working together to sell illicit goods?
Ebola One million tweets
300k docs from in-country web sites (mostly official sites)
Who is doing what and where?
Local Politics 300k docs from local political groups in Pacific Northwest
and British Columbia. Who is campaigning for what and
why?
217
TIME Line TREC Call for Participation: January 2015
Data Available: March
Detailed Guidelines: April/May
Topics, Tasks available: June
Systems do their thing: June-July
Evaluation: August
Results to participants: September
Conference: November 2015
218
TREC 2015
Total Recall Track
Co-organized by Gord Cormack, Maura Grossman, , Adam Roegiest, Charlie Clarke
Explores high recall tasks through an active learning process modeled on legal search tasks (eDiscovery, patent search). Participating system start with a topic and proposes
a relevant document.
Systems gets immediate feedback on relevance.
Continues to propose additional documents and receive feedback until stopping condition is researched.
Shared online infrastructure and collections with Dynamic Domain. Easy to participate in both, if you participate in one.
219
Acknowledgment
Dynamic Information Retrieval Modeling Tutorial
2015220
We thank Prof. Charlie Clarke and for his guest
lecture
We sincerely thank Dr. Xuchu Dong for his help in
preparation of the tutorial
We also thank comments and suggestions from
the following colleagues:
Dr Filip Radlinski
Prof. Maarten de Rijke
References
Dynamic Information Retrieval Modeling Tutorial
2015221
Static IR
Modern Information Retrieval. R. Baeza-Yates and B. Ribeiro-Neto. Addison-Wesley, 1999.
The PageRank Citation Ranking: Bringing Order to the Web. Lawrence Page , Sergey Brin , Rajeev Motwani , Terry Winograd. 1999
Implicit User Modeling for Personalized Search, Xuehua Shen et. al, CIKM, 2005
A Short Introduction to Learning to Rank. Hang Li, IEICE Transactions 94-D(10): 1854-1862, 2011.
Portfolio Theory of Information Retrieval. J. Wang and J. Zhu. In SIGIR 2009
References
Dynamic Information Retrieval Modeling Tutorial
2015222
Interactive IR
Relevance Feedback in Information Retrieval,
Rocchio, J. J., The SMART Retrieval System (pp.
313-23), 1971
A study in interface support mechanisms for
interactive information retrieval, Ryen W. White et. al,
JASIST, 2006
Visualizing stages during an exploratory search
session, Bill Kules et. al, HCIR, 2011
Dynamic Ranked Retrieval, Cristina Brandt et. al,
WSDM, 2011
Structured Learning of Two-level Dynamic Rankings,
Karthik Raman et. al, CIKM, 2011
References
Dynamic Information Retrieval Modeling Tutorial
2015223
Dynamic IR
A hidden Markov model information retrieval system. D. R. H. Miller, T. Leek, and R. M. Schwartz. In SIGIR’99, pages 214-221.
Threshold setting and performance optimization in adaptive filtering, Stephen Robertson, JIR 2002
A large-scale study of the evolution of web pages, Dennis Fetterly et. al., WWW 2003
Learning diverse rankings with multi-armed bandits. Filip Radlinski, Robert Kleinberg, Thorsten Joachims. ICML, 2008.
Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem, Yisong Yue et. al., ICML 2009
Meme-tracking and the dynamics of the news cycle, Jure Leskovec et. al., KDD 2009
References
Dynamic Information Retrieval Modeling Tutorial
2015224
Dynamic IR
Mortal multi-armed bandits. Deepayan Chakrabarti, Ravi Kumar, Filip Radlinski, Eli Upfal. NIPS 2009
A Novel Click Model and Its Applications to Online Advertising , Zeyuan Allen Zhu et. al., WSDM 2010
A contextual-bandit approach to personalized news article recommendation. Lihong Li, Wei Chu, John Langford, Robert E. Schapire. WWW, 2010
Inferring search behaviors using partially observable markov model with duration (POMD), Yin he et. al., WSDM, 2011
No Clicks, No Problem: Using Cursor Movements to Understand and Improve Search, Jeff Huang et. al., CHI 2011
Balancing Exploration and Exploitation in Learning to Rank Online, Katja Hofmann et. al., ECIR, 2011
Large-Scale Validation and Analysis of Interleaved Search Evaluation, Olivier Chapelle et. al., TOIS 2012
References
Dynamic Information Retrieval Modeling Tutorial
2015225
Dynamic IR
Using Control Theory for Stable and Efficient Recommender Systems. T. Jambor, J. Wang, N. Lathia. In: WWW '12, pages 11-20.
Sequential selection of correlated ads by POMDPs, Shuai Yuan et. al., CIKM 2012
Utilizing query change for session search. D. Guan, S. Zhang, and H. Yang. In SIGIR ’13, pages 453–462.
Query Change as Relevance Feedback in Session Search (short paper). S. Zhang, D. Guan, and H. Yang. In SIGIR 2013.
Interactive exploratory search for multi page search results. X. Jin, M. Sloan, and J. Wang. In WWW ’13.
Interactive Collaborative Filtering. X. Zhao, W. Zhang, J. Wang. In: CIKM'2013, pages 1411-1420.
References
Dynamic Information Retrieval Modeling Tutorial
2015226
Dynamic IR Win-win search: Dual-agent stochastic game in
session search. J. Luo, S. Zhang, and H. Yang. In SIGIR ’14.
Iterative Expectation for Multi-Period Information Retrieval. M. Sloan and J. Wang. In WSCD 2013.
Dynamical Information Retrieval Modelling: A Portfolio-Armed Bandit Machine Approach. M. Sloan and J. Wang. In WWW 2012.
Jiyun Luo, Sicong Zhang, Xuchu Dong and Hui Yang. Designing States, Actions, and Rewards for Using POMDP in Session Search. In ECIR 2015.
Sicong Zhang, Jiyun Luo, Hui Yang. A POMDP Model for Content-Free Document Re-ranking. In SIGIR 2014.
References
Dynamic Information Retrieval Modeling Tutorial
2015227
Markov Processes
A markovian decision process. R. Bellman. Indiana
University Mathematics Journal, 6:679–684, 1957.
Dynamic Programming. R. Bellman. Princeton University
Press, Princeton, NJ, USA, first edition, 1957.
Dynamic Programming and Markov Processes. R.A.
Howard. MIT Press. 1960
Linear Programming and Sequential Decisions. Alan S.
Manne. Management Science, 1960
Statistical Inference for Probabilistic Functions of Finite
State Markov Chains. Baum, Leonard E.; Petrie, Ted. The
Annals of Mathematical Statistics 37, 1966
References
Dynamic Information Retrieval Modeling Tutorial
2015228
Markov Processes
Learning to predict by the methods of temporal differences. Richard Sutton. Machine Learning 3. 1988
Computationally feasible bounds for partially observed Markov decision processes. W. Lovejoy. Operations Research 39: 162–175, 1991.
Q-Learning. Christopher J.C.H. Watkins, Peter Dayan. Machine Learning. 1992
Reinforcement learning with replacing eligibility traces. Singh, S. P. & Sutton, R. S. Machine Learning, 22, pages 123-158, 1996.
Reinforcement Learning: An Introduction. Richard S. Sutton and Andrew G. Barto. MIT Press, 1998.
Planning and acting in partially observable stochastic domains. L. Kaelbling, M. Littman, and A. Cassandra. Artificial Intelligence, 101(1-2):99–134, 1998.
References
Dynamic Information Retrieval Modeling Tutorial
2015229
Markov Processes
Finding approximate POMDP solutions through belief compression. N. Roy. PhD Thesis Carnegie Mellon. 2003
VDCBPI: an approximate scalable algorithm for large scale POMDPs, P. Poupart and C. Boutilier. In NIPS-2004, pages 1081–1088.
Finding Approximate POMDP solutions Through Belief Compression. N. Roy, G. Gordon and S. Thrun. Journal of Artificial Intelligence Research, 23:1-40,2005.
Probabilistic robotics. S. Thrun, W. Burgard, D. Fox. Cambridge. MIT Press. 2005
Anytime Point-Based Approximations for Large POMDPs. J. Pineau, G. Gordon and S. Thrun. Volume 27, pages 335-380, 2006
Probabilistic Robotics. S. Thrun, W. Burgard, D. Fox. The MIT Press, 2006.
References
Dynamic Information Retrieval Modeling Tutorial
2015230
Markov Processes
The optimal control of partially observable Markov decision processes over a finite horizon. R. D. Smallwood, E.J. Sondik. Operations Research. 1973
Modified Policy Iteration Algorithms for Discounted Markov Decision Problems. M. L. Puterman and Shin M. C. Management Science 24, 1978.
An example of statistical investigation of the text eugene oneginthe connection of samples in chains. A. A. Markov. Science in Context, 19:591–600, 12 2006.
Learning to Rank for Information Retrieval. Tie-Yan Liu. Springer Science & Business Media. 2011
Finite-Time Regret Bounds for the Multiarmed Bandit Problem, Nicolò Cesa-Bianchi, Paul Fischer. ICML 100-108, 1998
Multi-armed bandit allocation indices, Wiley, J. C. Gittins. 1989
Finite-time Analysis of the Multiarmed Bandit Problem, Peter Auer et. al., Machine Learning 47, Issue 2-3. 2002.