SIGIR Tutorial July 7th 2014
Grace Hui Yang
Marc Sloan
Jun Wang
Guest Speaker: Emine Yilmaz
Dynamic Information Retrieval
Modeling
Dynamic Information Retrieval Modeling Tutorial 2014 2
Age of Empire
Dynamic Information Retrieval Modeling Tutorial 2014 3
Dynamic Information Retrieval
Dynamic Information Retrieval Modeling Tutorial 2014 4
Documents
to explore Information
need
Observed
documents
User
Devise a strategy for
helping the user
explore the
information space in
order to learn which
documents are
relevant and which
aren’t, and satisfy
their information
need.
Evolving IR
Dynamic Information Retrieval Modeling Tutorial 2014 5
Paradigm shifts in IR as new models emerge
e.g. VSM → BM25 → Language Model
Different ways of defining relationship between
query and document
Static → Interactive → Dynamic
Evolution in modeling user interaction with search
engine
Outline
Dynamic Information Retrieval Modeling Tutorial 2014 6
Introduction
Static IR
Interactive IR
Dynamic IR
Theory and Models
Session Search
Reranking
Guest Talk: Evaluation
Conceptual Model – Static IR
Dynamic Information Retrieval Modeling Tutorial 2014 7
Static IR Interactive
IR Dynamic
IR
Static IR Interactive
IR Dynamic
IR
No feedback
Characteristics of Static IR
Dynamic Information Retrieval Modeling Tutorial 2014 8
Does not learn directly from user
Parameters updated periodically
Static Information Retrieval Model
Dynamic Information Retrieval Modeling Tutorial 2014 9
Learning to
Rank
Dynamic Information Retrieval Modeling Tutorial 2014 10
Commonly Used Static IR Models
BM25
PageRank
Language
Model
Feedback in IR
Dynamic Information Retrieval Modeling Tutorial 2014 11
Outline
Dynamic Information Retrieval Modeling Tutorial 2014 12
Introduction
Static IR
Interactive IR
Dynamic IR
Theory and Models
Session Search
Reranking
Guest Talk: Evaluation
Conceptual Model – Interactive IR
Dynamic Information Retrieval Modeling Tutorial 2014 13
Static IR Interactive
IR Dynamic
IR
Static IR Interactive
IR Dynamic
IR
Exploit Feedback
Interactive User Feedback
Dynamic Information Retrieval Modeling Tutorial 2014 14
Like, dislike,
pause, skip
Learn the user’s taste
interactively!
At the same time, provide good
recommendations!
Dynamic Information Retrieval Modeling Tutorial 2014 15
Interactive Recommender
Systems
Example - Multi Page Search
Dynamic Information Retrieval Modeling Tutorial 2014 16
Ambiguous
Query
Example - Multi Page Search
Dynamic Information Retrieval Modeling Tutorial 2014 17
Topic: Car
Example - Multi Page Search
Dynamic Information Retrieval Modeling Tutorial 2014 18
Topic: Animal
Example – Interactive Search
Dynamic Information Retrieval Modeling Tutorial 2014 19
Click on ‘car’
webpage
Example – Interactive Search
Dynamic Information Retrieval Modeling Tutorial 2014 20
Click on ‘Next
Page’
Example – Interactive Search
Dynamic Information Retrieval Modeling Tutorial 2014 21
Page 2 results:
Cars
Example – Interactive Search
Dynamic Information Retrieval Modeling Tutorial 2014 22
Click on ‘animal’
webpage
Example – Interactive Search
Dynamic Information Retrieval Modeling Tutorial 2014 23
Page 2 results:
Animals
Example – Dynamic Search
Dynamic Information Retrieval Modeling Tutorial 2014 24
Topic: Guitar
Example – Dynamic Search
Dynamic Information Retrieval Modeling Tutorial 2014 25
Diversified Page
1
Topics: Cars,
animals, guitars
Toy Example
Dynamic Information Retrieval Modeling Tutorial 2014 26
Multi-Page search scenario
User image searches for “jaguar”
Rank two of the four results over two pages:
𝑟 = 0.5 𝑟 = 0.51 𝑟 = 0.9 𝑟 = 0.49
Toy Example – Static Ranking
Dynamic Information Retrieval Modeling Tutorial 2014 27
Ranked according to PRP
Page 1 Page 2
1.
2.
𝑟 = 0.9
𝑟 = 0.51
1.
2.
𝑟 = 0.5
𝑟 = 0.49
Toy Example – Relevance
Feedback
Dynamic Information Retrieval Modeling Tutorial 2014 28
Interactive Search
Improve 2nd page based on feedback from 1st page
Use clicks as relevance feedback
Rocchio1 algorithm on terms in image webpage
𝑤𝑞′ = 𝛼𝑤𝑞 +
𝛽
|𝐷𝑟| 𝑤𝑑𝑑∈𝐷𝑟
−𝛾
𝐷𝑛 𝑤𝑑𝑑∈𝐷𝑛
New query closer to relevant documents and
different to non-relevant documents
1Rocchio, J. J., ’71, Baeza-Yates &
Ribeiro-Neto ‘99
Toy Example – Relevance
Feedback
Dynamic Information Retrieval Modeling Tutorial 2014 29
Ranked according to PRP and Rocchio
Page 1 Page 2
2.
𝑟 = 0.9
𝑟 = 0.51
1.
2.
𝑟 = 0.5
𝑟 = 0.49
1.
*
* Click
Toy Example – Relevance
Feedback
Dynamic Information Retrieval Modeling Tutorial 2014 30
No click when searching for animals
Page 1 Page 2
2.
𝑟 = 0.9
𝑟 = 0.51
1.
2.
1. ?
?
Toy Example – Value Function
Dynamic Information Retrieval Modeling Tutorial 2014 31
Optimize both pages using dynamic IR
Bellman equation for value function
Simplified example:
𝑉𝑡 𝜃𝑡, Σ𝑡 = max𝑠𝑡 𝜃𝑠𝑡 + 𝐸(𝑉𝑡+1 𝜃𝑡+1, Σ𝑡+1 𝐶𝑡)
𝜃𝑡, Σ𝑡 = relevance and covariance of documents for page 𝑡
𝐶𝑡 = clicks on page 𝑡
𝑉𝑡 = ‘value’ of ranking on page 𝑡
Maximize value over all pages based on estimating feedback
1 0.8 0.1 00.8 1 0.1 00.1 0.1 1 0.950 0 0.95 1
Toy Example - Covariance
Dynamic Information Retrieval Modeling Tutorial 2014 32
Covariance matrix represents similarity between images
Toy Example – Myopic Value
Dynamic Information Retrieval Modeling Tutorial 2014 33
For myopic ranking, 𝑉2 = 16.380
Page 1
2.
1.
Toy Example – Myopic Ranking
Dynamic Information Retrieval Modeling Tutorial 2014 34
Page 2 ranking stays the same regardless of clicks
Page 1 Page 2
2.
1.
2.
1.
Toy Example – Optimal Value
Dynamic Information Retrieval Modeling Tutorial 2014 35
For optimal ranking, 𝑉2 = 16.528
Page 1
2.
1.
Toy Example – Optimal Ranking
Dynamic Information Retrieval Modeling Tutorial 2014 36
If car clicked, Jaguar logo is more relevant on next page
Page 1 Page 2
2.
1.
2.
1.
Toy Example – Optimal Ranking
Dynamic Information Retrieval Modeling Tutorial 2014 37
In all other scenarios, rank animal first on next page
Page 1 Page 2
2.
1.
2.
1.
Interactive vs Dynamic IR
Dynamic Information Retrieval Modeling Tutorial 2014 38
• Treats interactions
independently
• Responds to
immediate
feedback
• Static IR used
before feedback
received
• Optimizes over
all interaction
• Long term gains
• Models future
user feedback
• Also used at
beginning of
interaction
Interactive Dynamic
Outline
Dynamic Information Retrieval Modeling Tutorial 2014 39
Introduction
Static IR
Interactive IR
Dynamic IR
Theory and Models
Session Search
Reranking
Guest Talk: Evaluation
Conceptual Model – Dynamic IR
Dynamic Information Retrieval Modeling Tutorial 2014 40
Static IR Interactive
IR Dynamic
IR
Static IR Interactive
IR Dynamic
IR
Explore and exploit Feedback
Characteristics of Dynamic IR
Dynamic Information Retrieval Modeling Tutorial 2014 41
Rich interactions
Query formulation
Document clicks
Document examination
eye movement
mouse movements
etc.
Characteristics of Dynamic IR
Dynamic Information Retrieval Modeling Tutorial 2014 42
Temporal dependency
clicked documents query
D1 ranked documents
q1 C1
D2
q2 C2 ……
…… Dn
qn Cn
I information need
iteration 1 iteration 2 iteration n
Characteristics of Dynamic IR
Dynamic Information Retrieval Modeling Tutorial 2014 43
Overall goal
Optimize over all iterations for goal
IR metric or user satisfaction
Optimal policy
Dynamic IR
Dynamic Information Retrieval Modeling Tutorial 2014 44
Dynamic IR explores actions
Dynamic IR learns from user and adjusts its
actions
May hurt performance in a single stage, but
improves over all stages
Applications to IR
Dynamic Information Retrieval Modeling Tutorial 2014 45
Dynamics found in lots of different aspects of IR
Dynamic Users
Users change behaviour over time, user history
Dynamic Documents
Information Filtering, document content change
Dynamic Queries
Changing query definition i.e. ‘Twitter’
Dynamic Information Needs
Topic ontologies evolve over time
Dynamic Relevance
Seasonal/time of day change in relevance
User Interactivity in DIR
Dynamic Information Retrieval Modeling Tutorial 2014 46
Modern IR interfaces
Facets
Verticals
Personalization
Responsive to particular user
Complex log data
Mobile
Richer user interactions
Ads
Adaptive targeting
Big Data
Dynamic Information Retrieval Modeling Tutorial 2014 47
Data set sizes are always increasing
Computational footprint of learning to rank
Rich, sequential data
1Yin He et. al, ’11
Complex user model behaviour found in data, takes into
account reading, skipping and re-reading behaviours1
Uses a POMDP
Example
Online Learning to Rank
Dynamic Information Retrieval Modeling Tutorial 2014 48
Learning to rank iteratively on sequential data
Clicks as implicit user feedback/preference
Often uses multi-armed bandit techniques
1Katja Hofmann et. al., ’11 2Yisong Yue et. al., ‘09
Uses click models to interpret clicks and a contextual bandit to improve learning1
Pairwise comparison of rankings using duelling bandits formulation2
Example
Evaluation
Dynamic Information Retrieval Modeling Tutorial 2014 49
Use complex user interaction data to assess rankings
Compare ranking techniques in online testing
Minimise user dissatisfaction
1Jeff Huang et. al., ‘11 2Olivier Chapelle et. al., ‘12
Modelled cursor activity and correlated with eye tracking to validate good or bad abandonment1
Interleave search results from two ranking algorithms to determine which is better2
Example
Filtering and News
Dynamic Information Retrieval Modeling Tutorial 2014 50
Adaptive techniques to personalize information filtering
or news recommendation
Understand the complex dynamics of real world events
in search logs
Capture temporal document change1
1Dennis Fetterly et. al., ‘03 2Stephen Robertson, ‘02 3Jure Leskovec et. al., ‘09
Uses relevance feedback to adapt threshold sensitivity over time in information filtering to maximise overal utility1
Detected patterns and memes in news cycles and modeled how information spreads2
Example
Advertising
Dynamic Information Retrieval Modeling Tutorial 2014 51
Behavioural targeting and personalized ads
Learn when to display new ads
Maximise profit from available ads
1Shuai Yuan et. al., ‘12 2Zeyuan Allen Zhu et. al., ‘10
Uses a POMDP and ad correlation to find the optimal ad to display to a user1
Dynamic click model that can interpret complex user behaviour in logs and apply results to tail queries and unseen ads2
Example
Outline
Dynamic Information Retrieval Modeling Tutorial 2014 52
Introduction
Theory and Models
Session Search
Reranking
Guest Talk: Evaluation
Outline
Dynamic Information Retrieval Modeling Tutorial 2014 53
Introduction
Theory and Models
Why not use supervised learning
Markov Models
Session Search
Reranking
Evaluation
Why not use Supervised Learning
for Dynamic IR Modeling?
Dynamic Information Retrieval Modeling Tutorial 2014 54
Lack of enough training data
Dynamic IR problems contain a sequence of dynamic interactions
E.g. a series of queries in session
Rare to find repeated sequences (close to zero)
Even in large query logs (WSCD 2013 & 2014, query logs from Yandex)
Chance of finding repeated adjacent query pairs is
also low
Dataset Repeated Adjacent
Query Pairs
Total Adjacent
Query Pairs
Repeated
Percentage
WSCD 2013 476,390 17,784,583 2.68%
WSCD 2014 1,959,440 35,376,008 5.54%
Our Solution
Dynamic Information Retrieval Modeling Tutorial 2014 55
Try to find an optimal solution through a
sequence of dynamic interactions
Trial and Error: learn from repeated, varied attempts which
are continued until success
No Supervised Learning
Trial and Error
Dynamic Information Retrieval Modeling Tutorial 2014 56
q1 – "dulles hotels"
q2 – "dulles airport"
q3 – "dulles airport location"
q4 – "dulles metrostop"
Dynamic Information Retrieval Modeling Tutorial 2014 57
Rich interactions
Query formulation, Document clicks, Document examination,
eye movement, mouse movements, etc.
Temporal dependency
Overall goal
Recap – Characteristics of
Dynamic IR
Dynamic Information Retrieval Modeling Tutorial 2014 58
Model interactions, which means it needs to have place holders for actions;
Model information need hidden behind user queries and other interactions;
Set up a reward mechanism to guide the entire search algorithm to adjust its retrieval strategies;
Represent Markov properties to handle the temporal dependency.
What is a Desirable Model for
Dynamic IR
A model in Trial and Error setting will do!
A Markov Model will do!
Outline
Dynamic Information Retrieval Modeling Tutorial 2014 59
Introduction
Theory and Models
Why not use supervised learning
Markov Models
Session Search
Reranking
Evaluation
Markov Process Markov Property1 (the “memoryless” property)
for a system, its next state depends on its current state.
Pr(Si+1|Si,…,S0)=Pr(Si+1|Si)
Markov Process
a stochastic process with Markov property.
e.g.
Dynamic Information Retrieval Modeling Tutorial 2014 60 1A. A. Markov, ‘06
s0 s1 …… si
…… si+1
Dynamic Information Retrieval Modeling Tutorial 2014 61
Markov Chain
Hidden Markov Model
Markov Decision Process
Partially Observable Markov Decision Process
Multi-armed Bandit
Family of Markov Models
A
Pagerank(A)
Discrete-time Markov process
Example: Google PageRank1
Markov Chain
B
Pagerank(B)
𝑃𝑎𝑔𝑒𝑟𝑎𝑛𝑘 𝑆 =1 − 𝛼
𝑁+ 𝛼
𝑃𝑎𝑔𝑒𝑟𝑎𝑛𝑘(𝑌)
𝐿(𝑌)𝑌∈Π
# of pages # of outlinks
pages linked to S
Dynamic Information Retrieval Modeling Tutorial 2014 62
D
Pagerank(D)
C
Pagerank(C)
E
Pagerank(E)
Random jump factor
1L. Page et. al., ‘99
The stable state distribution of such an MC is PageRank
State S – web page
Transition probability M
PageRank: how likely a random web surfer will land on a page
(S, M)
Hidden Markov Model
A Markov chain that states are hidden and observable
symbols are emitted with some probability according to its
states1.
Dynamic Information Retrieval Modeling Tutorial 2014 63
s0 s1 s2 ……
o0 o1 o2
p0
𝑒0
p1 p2
𝑒1 𝑒2
Si– hidden state pi -- transition probability oi --observation
ei --observation probability (emission probability)
1Leonard E. Baum et. al., ‘66
(S, M, O, e)
An HMM example for IR
Construct an HMM for each document1
Dynamic Information Retrieval Modeling Tutorial 2014 64
s0 s1 s2 ……
t0 t1 t2
p0
𝑒0
p1 p2
𝑒1 𝑒2
Si– “Document” or
“General English”
pi –a0 or a1
ti – query term
ei – Pr(t|D) or Pr(t|GE)
P(D|q)∝ (𝑎0𝑃 𝑡 𝐺𝐸 + 𝑎1𝑃(𝑡|𝐷))𝑡∈𝑞
Document-to-query relevance
1Miller et. al. ‘99
query
MDP extends MC with actions and rewards1
si– state ai – action ri – reward
pi – transition probability
p0 p1 p2
Markov Decision Process
Dynamic Information Retrieval Modeling Tutorial 2014 65
…… s0 s1
r0
a0
s2
r1
a1
s3
r2
a2
1R. Bellman, ‘57
(S, M, A, R, γ)
Definition of MDP A tuple (S, M, A, R, γ)
S : state space
M: transition matrix
Ma(s, s') = P(s'|s, a)
A: action space
R: reward function
R(s,a) = immediate reward taking action a at state s
γ: discount factor, 0< γ ≤1
policy π
π(s) = the action taken at state s
Goal is to find an optimal policy π* maximizing the expected total rewards.
Dynamic Information Retrieval Modeling Tutorial 2014 66
Policy
Policy: (s) = a According to which,
select an action a at
state s.
(s0) =move right and up s0
(s1) =move right and up s1
(s2) = move right s2
Dynamic Information Retrieval Modeling Tutorial 2014 67 [Slide altered from Carlos Guestrin’s ML lecture]
Value of Policy
Value: V(s) Expected long-term
reward starting from s
Start from s0
s0
R(s0) (s0)
V(s0) = E[R(s0) + R(s1) + 2 R(s2) + 3 R(s3)
+ 4 R(s4) + ]
Future rewards
discounted by [0,1)
Dynamic Information Retrieval Modeling Tutorial 2014 68 [Slide altered from Carlos Guestrin’s ML lecture]
Value of Policy
Value: V(s) Expected long-term
reward starting from s
Start from s0
s0
R(s0) (s0)
V(s0) = E[R(s0) + R(s1) + 2 R(s2) + 3 R(s3)
+ 4 R(s4) + ]
Future rewards
discounted by [0,1)
s1
R(s1) s1’’
s1’
R(s1’)
R(s1’’) Dynamic Information Retrieval Modeling Tutorial 2014 69 [Slide altered from Carlos Guestrin’s ML lecture]
Value of Policy
Value: V(s) Expected long-term
reward starting from s
Start from s0
s0
R(s0) (s0)
V(s0) = E[R(s0) + R(s1) + 2 R(s2) + 3 R(s3)
+ 4 R(s4) + ]
Future rewards
discounted by [0,1)
s1
R(s1) s1’’
s1’
R(s1’)
R(s1’’)
(s1)
R(s2)
s2
(s1’)
(s1’’)
s2’’
s2’
R(s2’)
R(s2’’) Dynamic Information Retrieval Modeling Tutorial 2014 70 [Slide altered from Carlos Guestrin’s ML lecture]
Computing the value of a policy
Dynamic Information Retrieval Modeling Tutorial 2014 71
V(s0) = 𝐸𝜋[𝑅 𝑠0, 𝑎 + 𝛾𝑅 𝑠1, 𝑎 + 𝛾2𝑅 𝑠2, 𝑎 + 𝛾
3𝑅 𝑠3, 𝑎 + ⋯ ]
=𝐸𝜋[𝑅 𝑠0, 𝑎 + 𝛾 𝛾𝑡−1𝑅(𝑠𝑡 , 𝑎)
∞𝑡=1 ]
=𝑅 𝑠0, 𝑎 + 𝛾𝐸𝜋[ 𝛾𝑡−1𝑅(𝑠𝑡 , 𝑎)∞𝑡=1 ]
=𝑅 𝑠0, 𝑎 + 𝛾 𝑀𝜋 𝑠 (𝑠, 𝑠′) 𝑉(𝑠′)𝑠′
Value function
A possible next state The current
state
Optimality — Bellman Equation
The Bellman equation1 to MDP is a recursive definition of
the optimal value function V*(.)
𝑉∗ s = max𝑎𝑅 𝑠, 𝑎 + 𝛾 𝑀𝑎(𝑠, 𝑠′)𝑉
∗(𝑠′)
𝑠′
Dynamic Information Retrieval Modeling Tutorial 2014 72
Optimal Policy
π∗ s = arg𝑚𝑎𝑥𝑎𝑅 𝑠, 𝑎 + 𝛾 𝑀𝑎 𝑠, 𝑠
′ 𝑉∗(𝑠′)
𝑠′
1R. Bellman, ‘57
state-value function
Optimality — Bellman Equation
The Bellman equation can be rewritten as
𝑉∗ 𝑠 = maxa𝑄(𝑠, 𝑎)
𝑄(𝑠, 𝑎) = 𝑅 𝑠, 𝑎 + 𝛾 𝑀𝑎(𝑠, 𝑠′)𝑉∗(𝑠′)
𝑠′
Dynamic Information Retrieval Modeling Tutorial 2014 73
Optimal Policy
π∗ s = arg𝑚𝑎𝑥𝑎𝑄 𝑠, 𝑎
action-value function
Relationship
between V and Q
MDP algorithms
Dynamic Information Retrieval Modeling Tutorial 2014 74
Value Iteration
Policy Iteration
Modified Policy Iteration
Prioritized Sweeping
Temporal Difference (TD) Learning
Q-Learning
Model free
approaches
Model-based
approaches
[Bellman, ’57, Howard, ‘60, Puterman and Shin, ‘78, Singh & Sutton, ‘96, Sutton & Barto, ‘98,
Richard Sutton, ‘88, Watkins, ‘92]
Solve Bellman
equation Optimal
value V*(s)
Optimal
policy *(s)
[Slide altered from Carlos Guestrin’s ML lecture]
Value Iteration
Initialization Initialize 𝑉0 𝑠 arbitrarily
Loop
Iteration 𝑉𝑖+1 𝑠 ← max
𝑎𝑅 𝑠, 𝑎 + 𝛾 𝑀𝑎(𝑠, 𝑠′)𝑉𝑖(𝑠′)𝑠′
π s ← arg𝑚𝑎𝑥𝑎𝑅 𝑠, 𝑎 + 𝛾 𝑀𝑎(𝑠, 𝑠′)𝑉𝑖(𝑠′)𝑠′
Stopping criteria π s is good enough
Dynamic Information Retrieval Modeling Tutorial 2014 75 1Bellman, ‘57
Greedy Value Iteration
Initialization Initialize 𝑉0 𝑠 arbitrarily
Iteration 𝑉𝑖+1 𝑠 ← max
𝑎𝑅 𝑠, 𝑎 + 𝛾 𝑀𝑎(𝑠, 𝑠′)𝑉𝑖(𝑠′)𝑠′
Stopping criteria
∀𝑠 𝑉𝑖+1 𝑠 − 𝑉𝑖 𝑠 < ε
Optimal policy
π s ← arg𝑚𝑎𝑥𝑎𝑅 𝑠, 𝑎 + 𝛾 𝑀𝑎(𝑠, 𝑠′)𝑉𝑖(𝑠′)
𝑠′
Dynamic Information Retrieval Modeling Tutorial 2014 76 1Bellman, ‘57
Greedy Value Iteration
1. For each state s∈S
Initialize V0(s) arbitrarily End for 2. 𝑖 ← 0 3. Repeat 3.1 𝑖 ← 𝑖 + 1 3.2 For each 𝑠 ∈ 𝑆 𝑉𝑖 𝑠 ← max
𝑎𝑅 𝑠, 𝑎 + 𝛾 𝑀𝑎(𝑠, 𝑠′)𝑉𝑖−1(𝑠′)𝑠′
end for until ∀𝑠 𝑉𝑖 𝑠 − 𝑉𝑖−1 𝑠 < ε
4. For each 𝑠 ∈ 𝑆
π s ← arg𝑚𝑎𝑥𝑎𝑅 𝑠, 𝑎 + 𝛾 𝑀𝑎(𝑠, 𝑠′)𝑉𝑖(𝑠′)𝑠′
end for
Algorithm
Dynamic Information Retrieval Modeling Tutorial 2014 77
V(0)(S1)=max{R(S1,a1), R(S1,a2)}=6
V(1)(S1)=max{ 3+0.96*(0.3*6+0.7*4), 6+0.96*(1.0*8) } =max{3+0.96*4.6, 6+0.96*8.0}
=max{7.416, 13.68}
=13.68
Greedy Value Iteration
𝑉 s = max𝑎𝑅 𝑠, 𝑎 + 𝛾 𝑀𝑎(𝑠, 𝑠′)𝑉(𝑠′)
𝑠′
V(0)(S2)=max{R(S2,a1), R(S2,a2)}=4
V(0)(S3)=max{R(S3,a1), R(S3,a2)}=8
Dynamic Information Retrieval Modeling Tutorial 2014 78
Ma1=0.3 0.7 01.0 0 00.8 0.2 0
Ma2=0 0 1.00 0.2 0.80 1.0 0
a1 a2
Greedy Value Iteration
𝑉 s = max𝑎𝑅 𝑠, 𝑎 + 𝛾 𝑀𝑎(𝑠, 𝑠′)𝑉(𝑠′)
𝑠′
Dynamic Information Retrieval Modeling Tutorial 2014 79
i V(i)(S1) V(i)(S2) V(i)(S3)
0 6 4 8
1 13.680 9.760 13.376
2 18.841 17.133 20.380
3 25.565 22.087 25.759
… … … …
200 168.039 165.316 168.793
Ma1=0.3 0.7 01.0 0 00.8 0.2 0
Ma2=0 0 1.00 0.2 0.80 1.0 0
a1 a2 a1
π S1 π S𝟐 π S𝟑
a2 a1 a1
Policy Iteration
Initialization
𝑉π0 𝑠 ←0, π0 s ← 𝑎𝑟𝑏𝑖𝑡𝑟𝑎𝑟𝑦 𝑝𝑜𝑙𝑖𝑐𝑦
Iteration (over i ) Policy Evaluation
𝑉π𝑖 𝑠∞←𝑅 𝑠, π𝑖 s + 𝛾 𝑀𝑎(𝑠, 𝑠′)𝑉π𝑖(𝑠′)
𝑠′
Policy Improvement
π𝑖+1 s ← arg𝑚𝑎𝑥𝑎𝑅 𝑠, 𝑎 + 𝛾 𝑀𝑎(𝑠, 𝑠′)𝑉π𝑖(𝑠′)𝑠′
Stop criteria
Policy stops changing
Dynamic Information Retrieval Modeling Tutorial 2014 80 1Howard , ‘60
Policy Iteration
1.For each state s∈S 𝑉 𝑠 ←0, π0 s ← 𝑎𝑟𝑏𝑖𝑡𝑟𝑎𝑟𝑦 𝑝𝑜𝑙𝑖𝑐𝑦 , 𝑖 ← 0 End for 2. Repeat 2.1 Repeat For each 𝑠 ∈ 𝑆 𝑉′(𝑠) ← 𝑉(𝑠) 𝑉 𝑠 ← 𝑅 𝑠, π𝑖 s + 𝛾 𝑀𝑎 𝑠, 𝑠
′ 𝑉(𝑠′)𝑠′
End for until ∀𝑠 𝑉 𝑠 − 𝑉′ 𝑠 < ε 2.2 For each 𝑠 ∈ 𝑆
π𝑖+1 s ← arg𝑚𝑎𝑥𝑎𝑅 𝑠, 𝑎 + 𝛾 𝑀𝑎 𝑠, 𝑠
′ 𝑉(𝑠′)
𝑠′
End for 2.3 𝑖 ← 𝑖 + 1 Until π𝑖 = π𝑖−1
Algorithm
Dynamic Information Retrieval Modeling Tutorial 2014 81
Modified Policy Iteration The “Policy Evaluation” step in Policy Iteration is time-
consuming, especially when the state space is large.
The Modified Policy Iteration calculates an approximated
policy evaluation by running just a few iterations
Dynamic Information Retrieval Modeling Tutorial 2014 82
Modified Policy
Iteration Policy Iteration
Greedy Value Iteration k=1
k=∞
Modified Policy Iteration
1.For each state s∈S 𝑉 𝑠 ←0, π0 s ← 𝑎𝑟𝑏𝑖𝑡𝑟𝑎𝑟𝑦 𝑝𝑜𝑙𝑖𝑐𝑦 , 𝑖 ← 0 End for 2. Repeat 2.1 Repeat k times For each 𝑠 ∈ 𝑆
𝑉 𝑠 ← 𝑅 𝑠, π𝑖 s + 𝛾 𝑀𝑎 𝑠, 𝑠′ 𝑉(𝑠′)𝑠′
End for 2.2 For each 𝑠 ∈ 𝑆 π𝑖+1 s ← arg𝑚𝑎𝑥
𝑎𝑅 𝑠, 𝑎 + 𝛾 𝑀𝑎 𝑠, 𝑠
′ 𝑉(𝑠′)
𝑠′
End for 2.3 𝑖 ← 𝑖 + 1 Until π𝑖 = π𝑖−1
Algorithm
Dynamic Information Retrieval Modeling Tutorial 2014 83
MDP algorithms
Dynamic Information Retrieval Modeling Tutorial 2014 84
Value Iteration
Policy Iteration
Modified Policy Iteration
Prioritized Sweeping
Temporal Difference (TD) Learning
Q-Learning
Model free
approaches
Model-based
approaches
[Bellman, ’57, Howard, ‘60, Puterman and Shin, ‘78, Singh & Sutton, ‘96, Sutton & Barto, ‘98,
Richard Sutton, ‘88, Watkins, ‘92]
Solve Bellman
equation Optimal
value V*(s)
Optimal
policy *(s)
[Slide altered from Carlos Guestrin’s ML lecture]
Temporal Difference Learning
Dynamic Information Retrieval Modeling Tutorial 2014 85
Monte Carlo Sampling can be used for model-free policy iteration Estimate 𝑉𝜋 s in “Policy Evaluation” by the average reward of trajectories from s However, on the trajectories, some of them can be reused
So, we estimate them by an expectation over next state
𝑉𝜋 s ← 𝑉𝜋 𝑠 + 𝑟 + γ𝐸 𝑉𝜋 𝑠′ |𝑠, 𝑎
The simplest estimation: 𝑉𝜋 s ← 𝑉𝜋 𝑠 + 𝑟 + 𝛾𝑉𝜋 s′
A smoothed version:
𝑉𝜋 s ← 𝑉𝜋 𝑠 + 𝛼 𝑟 + 𝛾𝑉𝜋 s′ + (1 − 𝛼) 𝑉𝜋 𝑠
TD-Learning rule: 𝑉𝜋 s ← 𝑉𝜋 𝑠 + 𝛼 𝑟 + 𝛾𝑉𝜋 𝑠
′ − 𝑉𝜋(𝑠)
r is the immediate reward, α is the learning rate
Temporal difference
Richard Sutton, ‘88
Singh & Sutton, ‘96
Sutton & Barto, ‘98
Dynamic Information Retrieval Modeling Tutorial 2014 86
1. For each state s∈S
Initialize V𝜋(s) arbitrarily
End for
2. For each step in the state sequence
2.1 Initialize s
2.2 repeat
2.2.1 take action a at state s according to 𝜋
2.2.2 observe immediate reward r and the next state 𝑠′
2.2.3 𝑉𝜋 s ← 𝑉𝜋 𝑠 + 𝛼 𝑟 + 𝛾𝑉𝜋 𝑠′ − 𝑉𝜋(𝑠)
2.2.4 𝑠 ← 𝑠′
Until s is a terminal state
End for
Algorithm
Temporal Difference Learning
Q-Learning
Dynamic Information Retrieval Modeling Tutorial 2014 87
TD-Learning rule
Q-learning rule
𝑄 𝑠, 𝑎 ← 𝑄 𝑠, 𝑎 + 𝛼 𝑟 + 𝛾max𝑎′𝑄 𝑠′, 𝑎′ − 𝑄(𝑠, 𝑎)
𝑉𝜋 s ← 𝑉𝜋 𝑠 + 𝛼 𝑟 + 𝛾𝑉𝜋 𝑠′ − 𝑉𝜋(𝑠)
𝑉 𝑠 = maxa𝑄(𝑠, 𝑎)
𝜋∗ 𝑠 = arg𝑚𝑎𝑥𝑎𝑄∗(𝑠, 𝑎)
𝑄∗ 𝑠, 𝑎 = 𝑅 𝑠, 𝑎 + 𝛾 𝑀𝑎(𝑠, 𝑠′)max𝑎′𝑄∗(𝑠′, 𝑎′)
𝑠′
Q-Learning
Dynamic Information Retrieval Modeling Tutorial 2014 88
1. For each state s∈S and a∈A initialize Q0(s,a) arbitrarily End for 2. 𝑖 ← 0 3. For each step in the state sequence 3.1 Initialize s 3.2 Repeat 3.2.1 𝑖 ← 𝑖 + 1 3.2.2 select an action a at state s according to Qi-1
3.2.3 take action a, observe immediate reward r and the next state 𝑠′
3.2.4 𝑄𝑖 𝑠, 𝑎 ← 𝑄𝑖−1 𝑠, 𝑎 + 𝛼 𝑟 + 𝛾max𝑎′𝑄𝑖−1 𝑠
′, 𝑎′ − 𝑄𝑖−1(𝑠, 𝑎)
3.2.5 𝑠 ← 𝑠′ Until s is a terminal state End for 4. For each 𝑠 ∈ 𝑆 π s ← arg𝑚𝑎𝑥
𝑎𝑄𝑖 𝑠, 𝑎
End for
Algorithm
Apply an MDP to an IR Problem
Dynamic Information Retrieval Modeling Tutorial 2014 89
We can model IR systems using a Markov Decision
Process
Is there a temporal component?
States – What changes with each time step?
Actions – How does your system change the state?
Rewards – How do you measure feedback or
effectiveness in your problem at each time step?
Transition Probability – Can you determine this?
If not, then model free approach is more suitable
Apply an MDP to an IR Problem -
Example
Dynamic Information Retrieval Modeling Tutorial 2014 90
User agent in session search
States – user’s relevance judgement
Action – new query
Reward – information gained
Apply an MDP to an IR Problem -
Example
Dynamic Information Retrieval Modeling Tutorial 2014 91
Search engine’s perspective
What if we can’t directly observe user’s relevance
judgement?
Click ≠ relevance
? ? ? ?
Dynamic Information Retrieval Modeling Tutorial 2014 92
Markov Chain
Hidden Markov Model
Markov Decision Process
Partially Observable Markov Decision Process
Multi-armed Bandit
Family of Markov Models
POMDP Model
Dynamic Information Retrieval Modeling Tutorial 2014 93
…… s0 s1
r0
a0
s2
r1
a1
s3
r2
a2
Hidden states
Observations
Belief
1R. D. Smallwood et. al., ‘73
o1 o2 o3
POMDP Definition
Dynamic Information Retrieval Modeling Tutorial 2014 94
A tuple (S, M, A, R, γ, O, Θ, B) S : state space M: transition matrix A: action space R: reward function
γ: discount factor, 0< γ ≤1 O: observation set an observation is a symbol emitted according to a hidden state.
Θ: observation function Θ(s,a,o) is the probability that o is observed when the system transitions into state s after taking action a, i.e. P(o|s,a). B: belief space Belief is a probability distribution over hidden states.
Dynamic Information Retrieval Modeling Tutorial 2014 95
The agent uses a state estimator to update its belief about the hidden states
b′ = 𝑆𝐸(𝑏, 𝑎, 𝑜′)
b′ s′ = P s′ o′, a, b =𝑃(𝑠′,𝑜′|𝑎,𝑏)
P(𝑜′|𝑎,𝑏)
=Θ(𝑠′, 𝑎, 𝑜′) 𝑀(𝑠, 𝑎, 𝑠′)𝑏(𝑠)𝑠
𝑃(𝑜′|𝑎, 𝑏)
POMDP → Belief Update
Dynamic Information Retrieval Modeling Tutorial 2014 96
The Bellman equation for POMDP
𝑉 𝑏 = max𝑎𝑟 𝑏, 𝑎 + 𝛾 𝑃(𝑜′|𝑎, 𝑏)𝑉(𝑏′)
𝑜′
A POMDP can be transformed into a continuous belief MDP (B, 𝑀′, A, r, γ)
B : the continuous belief space
𝑀′: transition function 𝑀𝑎′ (𝑏, 𝑏′)= 1𝑎,𝑜′(𝑏
′, 𝑏)Pr(𝑜′|𝑎, 𝑏)𝑜∈𝑂
where 1𝑎,𝑜′ 𝑏′, 𝑏 =
1, 𝑖𝑓 𝑆𝐸 𝑏, 𝑎, 𝑜′ = 𝑏′
0, 𝑒𝑙𝑠𝑒 .
A: action space
r: reward function r(b, a)= 𝑏 𝑠 𝑅(𝑠, 𝑎)𝑠∈𝑆
POMDP → Bellman Equation
Dynamic Information Retrieval Modeling Tutorial 2014 97
The optimal policy of a POMDP
The optimal policy of its belief MDP
1L. Kaelbling et. al., ’98
A variation of the value iteration algorithm
Solving POMDPs – The Witness
Algorithm
Policy Tree
Dynamic Information Retrieval Modeling Tutorial 2014 98
• A policy tree of depth i is an i-step non-stationary policy
• As if we run value iteration until the ith iteration
a(h)
ok(h) ok
a11
a21 a2k a2l
… …
…
…
…
… … … … … …
o1 ol
… aik …
a(i-1)k
ai1 ail
o1 ol ok
i steps to go
i-1 steps to go
2 steps to go
1 step to go
Value of a Policy Tree
Dynamic Information Retrieval Modeling Tutorial 2014 99
Can only determine the value of a policy tree h from some belief state
b, because it never knows the exact state.
𝑉ℎ 𝑏 = 𝑏(𝑠)𝑉ℎ(𝑠)𝑠∈𝑆
𝑉ℎ 𝑠 = 𝑅 𝑠, 𝑎 ℎ + 𝛾 𝑀𝑎 ℎ (𝑠, 𝑠′) Θ(𝑠′, 𝑎 ℎ , 𝑜𝑖)𝑉𝑜𝑘 ℎ (𝑠′)𝑜𝑘∈𝑂𝑠′∈𝑆
the action at the
root node of h
the (i-1)-step subtree associated
with ok under the root node of h
Idea of the Witness Algorithm
Dynamic Information Retrieval Modeling Tutorial 2014 100
For each action a, compute Γ𝑖𝑎, the set of candidate i-step policy
trees with action a at their roots
The optimal value function at the ith step, 𝑉𝑖∗(b), is the upper
surface of the value functions of all i-step policy trees.
Optimal value function
Dynamic Information Retrieval Modeling Tutorial 2014 101
Geometrically, 𝑉𝑖∗(b) is piecewise linear and convex.
An example for a two-state POMDP
b(s1)+b(s2)=1
Simplex constraint
The belief space is one-dimensional
Vh2(b)
Vh3(b)
Vh1(b)
Vh5(b)
Vh4(b)
𝑉𝑖∗ 𝑏 = max
ℎ∈H 𝑉ℎ 𝑏
Pruning the Set of
Policy Trees
Outlines of the Witness Algorithm
Dynamic Information Retrieval Modeling Tutorial 2014 102
Algorithm
1.𝐻1 ←{}
2. i ← 1
3. Repeat
3.1 i ← i+1
3.2 For each a in A Γ𝑖
𝑎 ← witness(𝐻i−1, a)
end for 3.3 Prune Γ𝑖
𝑎𝑎 to get 𝐻i
until 𝑠𝑢𝑝𝑏|Vi(b)− Vi−1(b)| < 𝜀
the inner loop
Inner Loop of the Witness
Algorithm
Dynamic Information Retrieval Modeling Tutorial 2014 103
Inner loop of the witness algorithm
1. Select a belief b arbitrarily. Generate a best i-step policy tree hi. Add
ℎi to an agenda.
2. In each iteration
2.1 Select a policy tree ℎ𝑛𝑒𝑤 from the agenda.
2.2 Look for a witness point b using Za and ℎ𝑛𝑒𝑤. 2.3 If find such a witness point b,
2.3.1 Calculate the best policy tree ℎ𝑏𝑒𝑠𝑡 for b.
2.3.2 Add ℎ𝑏𝑒𝑠𝑡 to Za.
2.3.3 Add all the alternative trees of ℎ𝑏𝑒𝑠𝑡 to the agenda.
2.4 Else remove ℎ𝑛𝑒𝑤 from the agenda.
3. Repeat the above iteration until the agenda is empty.
Other Solutions
Dynamic Information Retrieval Modeling Tutorial 2014 104
QMDP1
MC-POMDP (Monte Carlo POMDP)2
Grid Based Approximation3
Belief Compression4
……
1 Thrun et. al., ‘06 2 Thrun et. al., ‘05 3 Lovejoy, ‘91 4 Roy, ‘03
Dynamic Information Retrieval Modeling Tutorial 2014 105
POMDP Dynamic IR
Environment Documents
Agents User, Search engine
States Queries, User’s decision making status, Relevance of
documents, etc
Actions Provide a ranking of documents, Weigh terms in the query,
Add/remove/unchange the query terms, Switch on or
switch off a search technology, Adjust parameters for a
search technology
Observations Queries, Clicks, Document lists, Snippets, Terms, etc
Rewards Evaluation measures (such as DCG, NDCG or MAP)
Clicking information
Transition matrix Given in advance or estimated from training data.
Observation
function
Problem dependent, Estimated based on sample datasets
Applying POMDP to Dynamic IR
Session Search Example - States
SRT
Relevant &
Exploitation
SRR
Relevant &
Exploration
SNRT
Non-Relevant &
Exploitation
SNRR
Non-Relevant &
Exploration
scooter price ⟶ scooter stores Hartford visitors ⟶ Hartford
Connecticut tourism
Philadelphia NYC travel ⟶ Philadelphia NYC train
distance New York Boston ⟶
maps.bing.com
q0
106 [ J. Luo ,et al., ’14]
Session Search Example - Actions
(Au, Ase)
User Action(Au)
Add query terms (+Δq)
Remove query terms (-Δq)
keep query terms (qtheme)
clicked documents
SAT clicked documents
Search Engine Action(Ase)
increase/decrease/keep term weights,
Switch on or switch off query expansion
Adjust the number of top documents used in PRF
etc.
107 [ J. Luo et al., ’14]
Multi Page Search Example -
States & Actions
Dynamic Information Retrieval Modeling Tutorial 2014 108
State:
Relevance
of
document
Action:
Ranking of
documents
Observation:
Clicks Belief: Multivariate
Guassian
Reward: DCG over 2
pages
[Xiaoran Jin et. al., ’13]
SIGIR Tutorial July 7th 2014
Grace Hui Yang
Marc Sloan
Jun Wang
Guest Speaker: Emine Yilmaz
Dynamic Information Retrieval
Modeling
Exercise
Dynamic Information Retrieval Modeling Tutorial 2014 110
Markov Chain
Hidden Markov Model
Markov Decision Process
Partially Observable Markov Decision Process
Multi-Armed Bandit
Family of Markov Models
Multi Armed Bandits (MAB)
Dynamic Information Retrieval Modeling Tutorial 2014 111
……
……
Which slot
machine should
I select in this
round?
Reward
Multi Armed Bandits (MAB)
Dynamic Information Retrieval Modeling Tutorial 2014 112
I won! Is this
the best slot
machine?
Reward
MAB Definition
Dynamic Information Retrieval Modeling Tutorial 2014 113
A tuple (S, A, R, B)
S : hidden reward distribution of each bandit
A: choose which bandit to play
R: reward for playing bandit
B: belief space, our estimate of each bandit’s
distribution
Comparison with Markov Models
Dynamic Information Retrieval Modeling Tutorial 2014 114
Single state Markov Decision Process
No transition probability
Similar to POMDP in that we maintain a belief
state
Action = choose a bandit, does not affect state
Does not ‘plan ahead’ but intelligently adapts
Somewhere between interactive and dynamic IR
Markov Multi Armed Bandits
Dynamic Information Retrieval Modeling Tutorial 2014 115
……
……
Markov
Process 1
Markov
Process 2
Markov
Process k
Which slot
machine should
I select in this
round?
Reward
Markov Multi Armed Bandits
Dynamic Information Retrieval Modeling Tutorial 2014 116
……
……
Markov
Process 1
Markov
Process 2
Markov
Process k
Markov
Process
Action
Which slot
machine should
I select in this
round?
Reward
MAB Policy Reward
Dynamic Information Retrieval Modeling Tutorial 2014 117
MAB algorithm describes a policy 𝜋 for choosing
bandits
Maximise rewards from chosen bandits over all
time steps
Minimize regret
𝑅𝑒𝑤𝑎𝑟𝑑 𝑎∗ − 𝑅𝑒𝑤𝑎𝑟𝑑(𝑎𝜋(𝑡))𝑇𝑡=1
Cumulative difference between optimal reward and
actual reward
Exploration vs Exploitation
Dynamic Information Retrieval Modeling Tutorial 2014 118
Exploration
Try out bandits to find which has highest average reward
Exploitation
Too much exploration leads to poor performance
Play bandits that are known to pay out higher reward on average
MAB algorithms balance exploration and exploitation
Start by exploring more to find best bandits
Exploit more as best bandits become known
Exploration vs Exploitation
Dynamic Information Retrieval Modeling Tutorial 2014 119
MAB – Index Algorithms
Dynamic Information Retrieval Modeling Tutorial 2014 120
Gittens index1
Play bandit with highest ‘Dynamic Allocation Index’
Modelled using MDP but suffers ‘curse of dimensionality’
𝜖-greedy2
Play highest reward bandit with probability 1 − ϵ Play random bandit with probability 𝜖
UCB (Upper Confidence Bound)3
Play bandit 𝑖 with highest 𝑥𝑖 + 2 ln 𝑡
𝑇𝑖
Chances of playing infrequently played bandits increases over time
1J. C. Gittins. ‘89 2Nicolò Cesa-Bianchi et. al., ‘98 3P. Auer et. al., ‘02
MAB use in IR
Dynamic Information Retrieval Modeling Tutorial 2014 121
Choosing ads to display to users1
Each ad is a bandit
User click through rate is reward
Recommending news articles2
News article is a bandit
Similar to Information Filtering case
Diversifying search results3
Each rank position is an MAB dependent on higher ranks
Documents are bandits chosen by each rank
1Deepayan Chakrabarti et. al. , ‘09 2Lihong Li et. al., ’10 3Radlinski et. al., ‘08
MAB Variations
Dynamic Information Retrieval Modeling Tutorial 2014 122
Contextual Bandits1
World has some context 𝑥 ∈ 𝑋 (i.e. user location)
Learn policy 𝜋: 𝑋 → 𝐴 that maps context to arms (online or offline)
Duelling Bandits2
Play two (or more) bandits at each time step
Observe relative reward rather than absolute
Learn order of bandits
Mortal Bandits3
Value of bandits decays over time
Exploitation > exploration
1Lihong Li et. al., ‘10 2Yisong Yue et. al., ‘09 3Deepayan Chakrabarti et. al. , ‘09
Comparison of Markov Models
Dynamic Information Retrieval Modeling Tutorial 2014 123
MC – a fully observable stochastic process
HMM – a partially observable stochastic process
MDP – a fully observable decision process
MAB – a decision process, either fully or partially observable
POMDP – a partially observable decision process
actions rewards states
MC No No Observable
HMM No No Unobservable
MDP Yes Yes Observable
POMDP Yes Yes Unobservable
MAB Yes Yes Fixed
SIGIR Tutorial July 7th 2014
Grace Hui Yang
Marc Sloan
Jun Wang
Guest Speaker: Emine Yilmaz
Dynamic Information Retrieval
Modeling
Exercise
Outline
Dynamic Information Retrieval Modeling Tutorial 2014 125
Introduction
Theory and Models
Session Search
Reranking
Guest Talk: Evaluation
TREC Session Tracks (2010-2012)
Given a series of queries {q1,q2,…,qn}, top 10 retrieval
results {D1, … Di-1 } for q1 to qi-1, and click information
The task is to retrieve a list of documents for the current/last
query, qn
Relevance judgment is made based on how relevant the
documents are for qn, and how relevant they are for information
needs for the entire session (in topic description)
no need to segment the sessions
126
1.pocono mountains pennsylvania
2.pocono mountains pennsylvania hotels
3.pocono mountains pennsylvania things to do
4.pocono mountains pennsylvania hotels
5.pocono mountains camelbeach
6.pocono mountains camelbeach hotel
7.pocono mountains chateau resort
8.pocono mountains chateau resort attractions
9.pocono mountains chateau resort getting to
10.chateau resort getting to
11.pocono mountains chateau resort directions
TREC 2012 Session 6
127
Information needs:
You are planning a winter vacation to the
Pocono Mountains region in Pennsylvania in
the US. Where will you stay? What will you
do while there? How will you get there?
In a session, queries change
constantly
Query change is an important
form of feedback
We define query change as the syntactic editing changes
between two adjacent queries:
includes
, added terms
, removed terms
The unchanged/shared terms are called:
, theme term
1 iii qqq
iq
128
iqiq
iq
themeq q1 = “bollywood legislation”
q2 = “bollywood law”
---------------------------------------
Theme Term = “bollywood”
Added (+Δq) = “law”
Removed (-Δq) = “legislation”
Where do these query changes come
from?
Given TREC Session settings, we consider two sources of
query change:
the previous search results that a user viewed/read/examined
the information need
Example:
Kurosawa Kurosawa wife
`wife’ is not in any previous results, but in the topic description
However, knowing information needs before search is
difficult to achieve
129
Previous search results could influence
query change in quite complex ways
Merck lobbyists Merck lobbying US policy
D1 contains several mentions of ‘policy’, such as “A lobbyist who until 2004 worked as senior policy advisor to
Canadian Prime Minister Stephen Harper was hired last month by Merck …”
These mentions are about Canadian policies; while the user adds US policy in q2
Our guess is that the user might be inspired by ‘policy’, but he/she prefers a different sub-concept other than `Canadian policy’
Therefore, for the added terms `US policy’, ‘US’ is the novel term here, and ‘policy’ is not since it appeared in D1. The two terms should be treated differently
130
We propose to model session search as a Markov decision process (MDP)
Two agents: the User and the Search Engine
Dynamic Information Retrieval Modeling Tutorial 2014 131
Environments
Search results
States Queries
Actions
User actions:
Add/remove/unchange
the query terms
Search Engine actions:
Increase/ decrease
/remain term weights
Applying MDP to Session Search
Search Engine Agent’s Actions
∈ Di−1 action Example
qtheme
Y increase “pocono mountain” in s6
N increase “france world cup 98 reaction” in s28,
france world cup 98 reaction stock market→ france world cup 98 reaction
+∆q
Y decrease ‘policy’ in s37, Merck lobbyists → Merck
lobbyists US policy
N increase ‘US’ in s37, Merck lobbyists → Merck lobbyists
US policy
−∆q
Y decrease ‘reaction’ in s28, france world cup 98 reaction
→ france world cup 98
N No change
‘legislation’ in s32, bollywood legislation →bollywood law
132
Query Change retrieval Model
(QCM)
Bellman Equation gives the optimal value for an MDP:
The reward function is used as the document relevance score
function and is tweaked backwards from Bellman equation:
133
V*(s) = maxa
R(s,a) + g P(s' | s,a)s '
å V*(s')
a
Di
)D|(q P maxa) ,D ,q|(q P + d)|(q P = d) ,Score(q 1-i1-i1-i1-iiii1
Document
relevant score Query
Transition
model
Maximum
past
relevance Current
reward/relevanc
e score
Calculating the Transition Model
)|(log)|(
)|(log)()|(log)|(
)|(log)]|(1[+ d)|P(q log = d) ,Score(q
*1
*1
*1ii
*1
*1
dtPdtP
dtPtidfdtPdtP
dtPdtP
qti
dtqt
dtqt
i
qthemeti
ii
134
• According to Query Change and Search Engine
Actions Current reward/
relevance score
Increase weights
for theme terms
Decrease weights
for removed terms
Increase weights
for novel added
terms Decrease weights
for old added
terms
Maximizing the Reward Function
Generate a maximum rewarded document denoted as d*i-1,
from Di-1
That is the document(s) most relevant to qi-1
The relevance score can be calculated as
𝑃 𝑞𝑖−1 𝑑𝑖−1 = 1 − {1 − 𝑃(𝑡|𝑑𝑖−1)}𝑡∈𝑞𝑖−1
𝑃 𝑡 𝑑𝑖−1 =#(𝑡,𝑑𝑖−1)
|𝑑𝑖−1|
From several options, we choose to only use the document with top relevance
maxDi-1
P(qi-1 |Di-1)
135
Scoring the Entire Session
The overall relevance score for a session of queries is
aggregated recursively :
Scoresession(qn, d) = Score(qn, d) + gScoresession(qn-1, d)
= Score(qn, d) + g[Score(qn-1, d) + gScoresession (qn-2, d)]
= g n-i
i=1
n
å Score(qi, d)
136
Experiments
TREC 2011-2012 query sets, datasets
ClubWeb09 Category B
137
Search Accuracy (TREC 2012)
nDCG@10 (official metric used in TREC)
Approach nDCG@10 %chg MAP %chg
Lemur 0.2474 -21.54% 0.1274 -18.28%
TREC’12 median 0.2608 -17.29% 0.1440 -7.63%
Our TREC’12 submission
0.3021 −4.19% 0.1490 -4.43%
TREC’12 best 0.3221 0.00% 0.1559 0.00%
QCM 0.3353 4.10%† 0.1529 -1.92%
QCM+Dup 0.3368 4.56%† 0.1537 -1.41%
138
Search Accuracy (TREC 2011)
nDCG@10 (official metric used in TREC)
Approach nDCG@10 %chg MAP %chg
Lemur 0.3378 -23.38% 0.1118 -25.86%
TREC’11 median 0.3544 -19.62% 0.1143 -24.20%
TREC’11 best 0.4409 0.00% 0.1508 0.00%
QCM 0.4728 7.24%† 0.1713 13.59%†
QCM+Dup 0.4821 9.34%† 0.1714 13.66%†
Our TREC’12 submission
0.4836 9.68%† 0.1724 14.32%†
139
Search Accuracy for Different
Session Types TREC 2012 Sessions are classified into:
Product: Factual / Intellectual
Goal quality: Specific / Amorphous
Intellec
tual %chg Amorphous %chg Specific %chg Factual %chg
TREC best 0.3369 0.00% 0.3495 0.00% 0.3007 0.00% 0.3138 0.00%
Nugget 0.3305 -1.90% 0.3397 -2.80% 0.2736 -9.01% 0.2871 -8.51%
QCM 0.3870 14.87% 0.3689 5.55% 0.3091 2.79% 0.3066 -2.29%
QCM+DUP 0.3900 15.76% 0.3692 5.64% 0.3114 3.56% 0.3072 -2.10%
140
- Better handle sessions that demonstrate evolution and exploration
Because QCM treats a session as a continuous process by studying
changes among query transitions and modeling the dynamics
Outline
Dynamic Information Retrieval Modeling Tutorial 2014 141
Introduction
Theory and Models
Session Search
Reranking
Guest Talk: Evaluation
Multi Page Search
Dynamic Information Retrieval Modeling Tutorial 2014 142
Multi Page Search
Dynamic Information Retrieval Modeling Tutorial 2014 143
Page 1 Page 2
2.
1.
2.
1.
Relevance Feedback
Dynamic Information Retrieval Modeling Tutorial 2014 144
No UI Changes
Interactivity is Hidden
Private, performed in browser
Relevance Feedback
Dynamic Information Retrieval Modeling Tutorial 2014 145
Page 1
• Diverse Ranking
• Maximise
learning
potential
• Exploration vs
Exploitation
Page 2
• Clickthroughs or
explicit ratings
• Respond to
feedback from
page 1
• Personalized
Model
Dynamic Information Retrieval Modeling Tutorial 2014 146
Model
Dynamic Information Retrieval Modeling Tutorial 2014 147
𝑁 𝜃1, Σ1
𝜃1 -prior estimate of relevance
Σ1 - prior estimate of covariance Document similarity
Topic Clustering
Model
Dynamic Information Retrieval Modeling Tutorial 2014 148
Rank action for page 1
Model
Dynamic Information Retrieval Modeling Tutorial 2014 149
Model
Dynamic Information Retrieval Modeling Tutorial 2014 150
Feedback from page 1
𝒓 ~ 𝑁(𝜃𝒔1, Σ𝒔1)
Model
Dynamic Information Retrieval Modeling Tutorial 2014 151
Update estimates using 𝒓1
𝜃1 = 𝜃\𝒔′𝜃𝒔′ Σ1 =
Σ\𝒔′ Σ\s′𝒔′Σs′\𝒔′ Σ𝒔′
𝜃2 = 𝜃\𝒔′ + Σ\s′𝒔′Σ𝒔′−1(𝒓1 − 𝜃𝒔′)
Σ2 = Σ\𝒔′ - Σ\s′𝒔′Σ𝒔′−1Σs′\𝒔′
Model
Dynamic Information Retrieval Modeling Tutorial 2014 152
Rank using PRP
Model
Dynamic Information Retrieval Modeling Tutorial 2014 153
Utility or Ranking
𝜆 𝜃𝑠𝑗1
log2(𝑗+1)+ 1 − 𝜆
𝜃𝑠𝑗2
log2(𝑗+1)2𝑀𝑗=1+𝑀
𝑀𝑗=1
DCG
Model – Bellman Equation
Dynamic Information Retrieval Modeling Tutorial 2014 154
Optimize 𝒔1 to improve 𝑼𝒔2
𝑉 𝜃1, Σ1, 1 =
max𝒔1𝜆𝜃𝒔1.𝑾1 + max
𝒔2(1 − 𝜆) 𝜃𝒔
2.𝑾2𝑃 𝒓 𝑑𝒓𝒓
𝜆
Dynamic Information Retrieval Modeling Tutorial 2014 155
Balances exploration and exploitation in page 1
Tuned for different queries
Navigational
Informational
𝜆 = 1 for non-ambiguous search
Approximation
Dynamic Information Retrieval Modeling Tutorial 2014 156
Monte Carlo Sampling
≈ max𝒔1𝜆𝜃𝒔1.𝑾1 +max
𝒔21 − 𝜆
1
𝑆 𝜃𝒔
2.𝑾2𝑃 𝒓𝑟 ∈𝑂
Sequential Ranking Decision
Experiment Data
Dynamic Information Retrieval Modeling Tutorial 2014 157
Difficult to evaluate without access to live users
Simulated using 3 TREC collections and relevance
judgements
WT10G – Explicit Ratings
TREC8 – Clickthroughs
Robust – Difficult (ambiguous) search
User Simulation
Dynamic Information Retrieval Modeling Tutorial 2014 158
Rank M documents
Simulated user clicks according to relevance judgements
Update page 2 ranking
Measure at page 1 and 2
Recall
Precision
nDCG
MRR
BM25 – prior ranking model
Investigating λ
Dynamic Information Retrieval Modeling Tutorial 2014 159
Baselines
Dynamic Information Retrieval Modeling Tutorial 2014 160
𝜆 determined experimentally
BM25
BM25 with conditional update (𝜆 = 1)
Maximum Marginal Relevance (MMR)
Diversification
MMR with conditional update
Rocchio
Relevance Feedback
Results
Dynamic Information Retrieval Modeling Tutorial 2014 161
Results
Dynamic Information Retrieval Modeling Tutorial 2014 162
Results
Dynamic Information Retrieval Modeling Tutorial 2014 163
Results
Dynamic Information Retrieval Modeling Tutorial 2014 164
Results
Dynamic Information Retrieval Modeling Tutorial 2014 165
Similar results across data sets and metrics
2nd page gain outweighs 1st page losses
Outperformed Maximum Marginal Relevance using MRR to
measure diversity
BM25-U simply no exploration case
Similar results when 𝑀 = 5
Results
Dynamic Information Retrieval Modeling Tutorial 2014 166
Outline
Dynamic Information Retrieval Modeling Tutorial 2014 167
Introduction
Theory and Models
Session Search
Reranking
Guest Talk: Evaluation
Dynamic Information Retrieval
Evaluation
Emine Yilmaz
University College London
Information Retrieval Systems
Match information seekers with
the information they seek
Retrieval Evaluation: Traditional
View
Retrieval Evaluation: Dynamic
View
Retrieval Evaluation: Dynamic
View
Retrieval Evaluation: Dynamic
View
Different Approaches to
Evaluation
Online Evaluation
Design interactive experiments
Use users’ actions to evaluate the quality
Inherently dynamic in nature
Offline Evaluation
Controlled laboratory experiments
The users’ interaction with the engine is only simulated
Recent work focused on dynamic IR evaluation
Online Evaluation
Standard click metrics
Clickthrough rate
Probability user skips over results they have considered (pSkip)
Most recently: Result interleaving
Click/Noclick
Evaluate
175
What is result interleaving? A way to compare rankers online
Given the two rankings produced by two methods
Present a combination of the rankings to users
Team Draft Interleaving (Radlinski et al., 2008)
Interleaving two rankings
Input: Two rankings (“can be seen as teams who pick players”)
Repeat:
o Toss a coin to see which team (ranking) picks next
o Winner picks their best remaining player (document)
o Loser picks their best remaining player (document)
Output: One ranking (2 teams of 5)
Credit assignment
Ranking providing more of the clicked results wins
Team Draft Interleaving
Ranking A 1. Napa Valley – The authority for lodging... www.napavalley.com 2. Napa Valley Wineries - Plan your wine... www.napavalley.com/wineries 3. Napa Valley College www.napavalley.edu/homex.asp 4. Been There | Tips | Napa Valley www.ivebeenthere.co.uk/tips/16681 5. Napa Valley Wineries and Wine www.napavintners.com 6. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley
Ranking B 1. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley 2. Napa Valley – The authority for lodging... www.napavalley.com 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Hotels – Bed and Breakfast... www.napalinks.com 5. NapaValley.org www.napavalley.org 6. The Napa Valley Marathon www.napavalleymarathon.org
Presented Ranking 1. Napa Valley – The authority for lodging... www.napavalley.com 2. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Wineries – Plan your wine... www.napavalley.com/wineries 5. Napa Valley Hotels – Bed and Breakfast... www.napalinks.com 6. Napa Valley College www.napavalley.edu/homex.asp 7 NapaValley.org www.napavalley.org
A B
Team Draft Interleaving
Ranking A 1. Napa Valley – The authority for lodging... www.napavalley.com 2. Napa Valley Wineries - Plan your wine... www.napavalley.com/wineries 3. Napa Valley College www.napavalley.edu/homex.asp 4. Been There | Tips | Napa Valley www.ivebeenthere.co.uk/tips/16681 5. Napa Valley Wineries and Wine www.napavintners.com 6. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley
Ranking B 1. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley 2. Napa Valley – The authority for lodging... www.napavalley.com 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Hotels – Bed and Breakfast... www.napalinks.com 5. NapaValley.org www.napavalley.org 6. The Napa Valley Marathon www.napavalleymarathon.org
Presented Ranking 1. Napa Valley – The authority for lodging... www.napavalley.com 2. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Wineries – Plan your wine... www.napavalley.com/wineries 5. Napa Valley Hotels – Bed and Breakfast... www.napalinks.com 6. Napa Valley College www.napavalley.edu/homex.asp 7 NapaValley.org www.napavalley.org
B wins!
Team Draft Interleaving
Ranking A 1. Napa Valley – The authority for lodging... www.napavalley.com 2. Napa Valley Wineries - Plan your wine... www.napavalley.com/wineries 3. Napa Valley College www.napavalley.edu/homex.asp 4. Been There | Tips | Napa Valley www.ivebeenthere.co.uk/tips/16681 5. Napa Valley Wineries and Wine www.napavintners.com 6. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley
Ranking B 1. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley 2. Napa Valley – The authority for lodging... www.napavalley.com 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Hotels – Bed and Breakfast... www.napalinks.com 5. NapaValley.org www.napavalley.org 6. The Napa Valley Marathon www.napavalleymarathon.org
Presented Ranking 1. Napa Valley – The authority for lodging... www.napavalley.com 2. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Wineries – Plan your wine... www.napavalley.com/wineries 5. Napa Valley Hotels – Bed and Breakfast... www.napalinks.com 6. Napa Valley College www.napavalley.edu/homex.asp 7 NapaValley.org www.napavalley.org
B wins!
Repeat Over Many Different
Queries!
Offline Evaluation
Controlled laboratory experiments
The user’s interaction with the engine is
only simulated Ask experts to judge each query result
Predict how users behave when they search
Aggregate judgments to evaluate
180
Offline Evaluation
Until recently: Metrics assume that user’s information need was not affected by the documents read
E.g. Average Precision, NDCG, …
• Users are more likely to stop searching when they see a highly relevant document
• Lately: Metrics that incorporate the affect of relevance of documents seen by the user on user behavior
Based on devising more realistic user models
EBU, ERR [Yilmaz et al CIKM10, Chapelle et al CIKM09]
181
Modeling User Behavior
Cascade-based models
black powder
ammunition
1
2
3
4
5
6
7
8
9
10
…
• The user views search results from top to bottom
• At each rank i, the user has a certain probability of being
satisfied.
• Probability of satisfaction proportional to the
relevance grade of the document at rank i.
• Once the user is satisfied with a document, he terminates
the search.
Rank Biased Precision
Query
Stop
View Next
Item
black powder
ammunition
1
2
3
4
5
6
7
8
9
10
…
Rank Biased Precision black powder
ammunition
1
2
3
4
5
6
7
8
9
10
…
1=i
1=utility Total i
irel
examined docs m.utility/Nu Total RBP
)1/(1)1(=examined docs Num.1=i
1
ii
)-(1= RBP1=i
1
i
irel
Expected Reciprocal Rank [Chapelle et al CIKM09]
Query
Stop
Relevant?
View Next
Item
no somewhat highly
black powder
ammunition
1
2
3
4
5
6
7
8
9
10
…
Expected Reciprocal Rank [Chapelle et al CIKM09]
black powder
ammunition
1
2
3
4
5
6
7
8
9
10
…
rrank at document"perfect the" finding of Utility :(r)
1/r (r)
)position at stopsuser (1
1
rPr
ERRn
r
1
11
)1(1 r
i
ri
n
r
RRr
ERR
document i theof grade relevance : th
ig
iRig
g
i
i
docat stop of Prob.2
12 doc of relevance of Prob.
max
Paris Luxurious Hotels Paris Hilton J Lo Session Evaluation
What is a good system?
Measuring “goodness”
The user steps down a ranked list of documents and
observes each one of them until a decision point and either
a) abandons the search, or
b) reformulates
While stepping down or sideways, the user accumulates
utility
Evaluation over a single ranked list
1
2
3
4
5
6
7
8
9
10
…
kenya cooking
traditional swahili
kenya cooking
traditional
kenya swahili
traditional food
recipes
Session DCG [Järvelin et al ECIR 2008]
kenya cooking
traditional swahili
kenya cooking
traditional
2rel(r ) 1
logb (r b 1)r1
k
2rel(r ) 1
logb (r b 1)r1
k
1
logc (1 c 1)DCG(RL1)
1
logc (2 c 1) DCG(RL2)
Model-based measures
Probabilistic space of users following
different paths
Ω is the space of all paths
P(ω) is the prob of a user following a path ω in Ω
Mω is a measure over a path ω
[Yang and Lad ICTIR 2009,
Kanoulas et al. SIGIR 2011]
Probability of a path
Probability of abandoning at
reform 2
X
Probability of reformulating at rank
3
Q1 Q2 Q3
N R R
N R R
N R R
N R R
N R R
N N R
N N R
N N R
N N R
N N R
… … …
(1)
(2)
Expected Global Utility [Yang and Lad ICTIR 2009]
1. User steps down ranked results one-by-one
2. Stops browsing documents based on a stochastic process
that defines a stopping probability distribution over ranks
and reformulates
3. Gains something from relevant documents, accumulating
utility
Q1 Q2 Q3
N R R
N R R
N R R
N R R
N R R
N N R
N N R
N N R
N N R
N N R
… … …
Probability
of abandoning
the session at
reformulation i
Geometric w/ parameter preform
(1)
Q1 Q2 Q3
N R R
N R R
N R R
N R R
N R R
N N R
N N R
N N R
N N R
N N R
… … …
Geo
met
ric
w/
par
amet
er p
dow
n
Probability
of reformulating
at rank j
(2)
Geometric w/ parameter preform
Expected Global Utility [Yang and Lad ICTIR 2009]
The probability of a user following a path ω:
P(ω) = P(r1, r2, ..., rK)
ri is the stopping and reformulation point in list i
Assumption: stopping positions in each list are independent
P(r1, r2, ..., rK) = P(r1)P(r2)...P(rK)
Use geometric distribution (RBP) to model the stopping and
reformulation behaviour
P(ri = r) = (1-) k1
Conclusions
Recent focus on evaluating the dynamic nature of the search
process
Interleaving
New offline evaluation metrics
ERR, RBU
Session evaluation metrics
Outline
Dynamic Information Retrieval Modeling Tutorial 2014 200
Introduction
Theory and Models
Session Search
Reranking
Guest Talk: Evaluation
Conclusion
Conclusions
Dynamic Information Retrieval Modeling Tutorial 2014 201
Dynamic IR describes a new class of interactive model
Incorporates rich feedback, temporal dependency and is goal
oriented.
Family of Markov models and Multi Armed Bandit theory
useful in building DIR models
Applicable to a range of IR problems
Useful in applications such as session search and evaluation
Dynamic IR Book
Dynamic Information Retrieval Modeling Tutorial 2014 202
Published by Morgan & Claypool
‘Synthesis Lectures on Information Concepts, Retrieval, and
Services’
Due March/April 2015 (in time for SIGIR 2015)
Acknowledgment
Dynamic Information Retrieval Modeling Tutorial 2014 203
We thank Dr. Emine Yilmaz for giving us the guest speech.
We sincerely thank Dr. Xuchu Dong for his help in
preparation of the tutorial
We also thank comments and suggestions from the following
colleagues:
Dr. Jamie Callan
Dr. Ophir Frieder
Dr. Fernando Diaz
Dr Filip Radlinski
Dynamic Information Retrieval Modeling Tutorial 2014 204
Thank You
Dynamic Information Retrieval Modeling Tutorial 2014 205
References
Dynamic Information Retrieval Modeling Tutorial 2014 206
Static IR
Modern Information Retrieval. R. Baeza-Yates and B. Ribeiro-
Neto. Addison-Wesley, 1999.
The PageRank Citation Ranking: Bringing Order to the Web.
Lawrence Page , Sergey Brin , Rajeev Motwani , Terry Winograd.
1999
Implicit User Modeling for Personalized Search, Xuehua Shen et.
al, CIKM, 2005
A Short Introduction to Learning to Rank. Hang Li, IEICE
Transactions 94-D(10): 1854-1862, 2011.
References
Dynamic Information Retrieval Modeling Tutorial 2014 207
Interactive IR
Relevance Feedback in Information Retrieval, Rocchio, J. J., The
SMART Retrieval System (pp. 313-23), 1971
A study in interface support mechanisms for interactive
information retrieval, Ryen W. White et. al, JASIST, 2006
Visualizing stages during an exploratory search session, Bill Kules
et. al, HCIR, 2011
Dynamic Ranked Retrieval, Cristina Brandt et. al, WSDM, 2011
Structured Learning of Two-level Dynamic Rankings, Karthik
Raman et. al, CIKM, 2011
References
Dynamic Information Retrieval Modeling Tutorial 2014 208
Dynamic IR
A hidden Markov model information retrieval system. D. R. H. Miller, T. Leek, and R. M. Schwartz. In SIGIR’99, pages 214-221.
Threshold setting and performance optimization in adaptive filtering, Stephen Robertson, JIR 2002
A large-scale study of the evolution of web pages, Dennis Fetterly et. al., WWW 2003
Learning diverse rankings with multi-armed bandits. Filip Radlinski, Robert Kleinberg, Thorsten Joachims. ICML, 2008.
Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem, Yisong Yue et. al., ICML 2009
Meme-tracking and the dynamics of the news cycle, Jure Leskovec et. al., KDD 2009
References
Dynamic Information Retrieval Modeling Tutorial 2014 209
Dynamic IR
Mortal multi-armed bandits. Deepayan Chakrabarti, Ravi Kumar, Filip Radlinski, Eli Upfal. NIPS 2009
A Novel Click Model and Its Applications to Online Advertising , Zeyuan Allen Zhu et. al., WSDM 2010
A contextual-bandit approach to personalized news article recommendation. Lihong Li, Wei Chu, John Langford, Robert E. Schapire. WWW, 2010
Inferring search behaviors using partially observable markov model with duration (POMD), Yin he et. al., WSDM, 2011
No Clicks, No Problem: Using Cursor Movements to Understand and Improve Search, Jeff Huang et. al., CHI 2011
Balancing Exploration and Exploitation in Learning to Rank Online, Katja Hofmann et. al., ECIR, 2011
Large-Scale Validation and Analysis of Interleaved Search Evaluation, Olivier Chapelle et. al., TOIS 2012
References
Dynamic Information Retrieval Modeling Tutorial 2014 210
Dynamic IR
Using Control Theory for Stable and Efficient Recommender Systems. T. Jambor, J. Wang, N. Lathia. In: WWW '12, pages 11-20.
Sequential selection of correlated ads by POMDPs, Shuai Yuan et. al., CIKM 2012
Utilizing query change for session search. D. Guan, S. Zhang, and H. Yang. In SIGIR ’13, pages 453–462.
Query Change as Relevance Feedback in Session Search (short paper). S. Zhang, D. Guan, and H. Yang. In SIGIR 2013.
Interactive exploratory search for multi page search results. X. Jin, M. Sloan, and J. Wang. In WWW ’13.
Interactive Collaborative Filtering. X. Zhao, W. Zhang, J. Wang. In: CIKM'2013, pages 1411-1420.
Win-win search: Dual-agent stochastic game in session search. J. Luo, S. Zhang, and H. Yang. In SIGIR ’14.
References
Dynamic Information Retrieval Modeling Tutorial 2014 211
Markov Processes
A markovian decision process. R. Bellman. Indiana University
Mathematics Journal, 6:679–684, 1957.
Dynamic Programming. R. Bellman. Princeton University Press,
Princeton, NJ, USA, first edition, 1957.
Dynamic Programming and Markov Processes. R.A. Howard. MIT Press.
1960
Linear Programming and Sequential Decisions. Alan S. Manne.
Management Science, 1960
Statistical Inference for Probabilistic Functions of Finite State Markov
Chains. Baum, Leonard E.; Petrie, Ted. The Annals of Mathematical
Statistics 37, 1966
References
Dynamic Information Retrieval Modeling Tutorial 2014 212
Markov Processes
Learning to predict by the methods of temporal differences. Richard Sutton. Machine Learning 3. 1988
Computationally feasible bounds for partially observed Markov decision processes. W. Lovejoy. Operations Research 39: 162–175, 1991.
Q-Learning. Christopher J.C.H. Watkins, Peter Dayan. Machine Learning. 1992
Reinforcement learning with replacing eligibility traces. Singh, S. P. & Sutton, R. S. Machine Learning, 22, pages 123-158, 1996.
Reinforcement Learning: An Introduction. Richard S. Sutton and Andrew G. Barto. MIT Press, 1998.
Planning and acting in partially observable stochastic domains. L. Kaelbling, M. Littman, and A. Cassandra. Artificial Intelligence, 101(1-2):99–134, 1998.
References
Dynamic Information Retrieval Modeling Tutorial 2014 213
Markov Processes
Finding approximate POMDP solutions through belief compression. N. Roy. PhD Thesis Carnegie Mellon. 2003
VDCBPI: an approximate scalable algorithm for large scale POMDPs, P. Poupart and C. Boutilier. In NIPS-2004, pages 1081–1088.
Finding Approximate POMDP solutions Through Belief Compression. N. Roy, G. Gordon and S. Thrun. Journal of Artificial Intelligence Research, 23:1-40,2005.
Probabilistic robotics. S. Thrun, W. Burgard, D. Fox. Cambridge. MIT Press. 2005
Anytime Point-Based Approximations for Large POMDPs. J. Pineau, G. Gordon and S. Thrun. Volume 27, pages 335-380, 2006
Probabilistic Robotics. S. Thrun, W. Burgard, D. Fox. The MIT Press, 2006.
References
Dynamic Information Retrieval Modeling Tutorial 2014 214
Markov Processes
The optimal control of partially observable Markov decision processes over a finite horizon. R. D. Smallwood, E.J. Sondik. Operations Research. 1973
Modified Policy Iteration Algorithms for Discounted Markov Decision Problems. M. L. Puterman and Shin M. C. Management Science 24, 1978.
An example of statistical investigation of the text eugene onegin the connection of samples in chains. A. A. Markov. Science in Context, 19:591–600, 12 2006.
Learning to Rank for Information Retrieval. Tie-Yan Liu. Springer Science & Business Media. 2011
Finite-Time Regret Bounds for the Multiarmed Bandit Problem, Nicolò Cesa-Bianchi, Paul Fischer. ICML 100-108, 1998
Multi-armed bandit allocation indices, Wiley, J. C. Gittins. 1989
Finite-time Analysis of the Multiarmed Bandit Problem, Peter Auer et. al., Machine Learning 47, Issue 2-3. 2002.