Information-Based Models for Ad Hoc IR St´ ephane Clinchant 1,2 Eric Gaussier 2 1 Xerox Research Centre Europe 2 Laboratoire d’Informatique de Grenoble Univ. Grenoble 1 SIGIR’10, 20 July 2010 S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 1 / 33
55
Embed
Information Models for Ad Hoc Information Retrieval, SIGIR 2010
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Information-Based Models for Ad Hoc IR
Stephane Clinchant 1,2 Eric Gaussier 2
1 Xerox Research Centre Europe
2 Laboratoire d’Informatique de GrenobleUniv. Grenoble 1
SIGIR’10, 20 July 2010
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 1 / 33
Overview
Information ModelsNormalization
Probability DistributionRSV
Heuristic Constraints
Condition 1Condition 2Condition 3Condition 4
BurstinessPhenomenon
Property of Prob.Distributions
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 2 / 33
Informative Content
Use Shannon’s information to weigh words in documents
P(X)−log P(X)
Inf(x) = − log P(x |ΘC ) = Informative ContentDeviation from an average behavior
- Observation by Harter (70): non-specialty words deviates from a Poisson- Informative Content, core to Divergence From Randomness Models
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 3 / 33
Informative Content
Use Shannon’s information to weigh words in documents
P(X)−log P(X)
Inf(x) = − log P(x |ΘC ) = Informative ContentDeviation from an average behavior- Observation by Harter (70): non-specialty words deviates from a Poisson- Informative Content, core to Divergence From Randomness Models
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 3 / 33
Information-based Model
Main idea:
1 Discrete terms frequencies x are renormalized into continuousvalues t(x), due to different document length
2 For each term w , values t(x) are assumed to follow a distribution Pwith parameter λw on the corpus, ie Tfw |λw ∼ P
3 Queries and documents are compared with a surprise measure, amean information:
RSV (q, d) =∑w∈q
−xqw log P(Tfw > t(xd
w )|λw )
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 4 / 33
Information-based Model
Main idea:
1 Discrete terms frequencies x are renormalized into continuousvalues t(x), due to different document length
2 For each term w , values t(x) are assumed to follow a distribution Pwith parameter λw on the corpus, ie Tfw |λw ∼ P
3 Queries and documents are compared with a surprise measure, amean information:
RSV (q, d) =∑w∈q
−xqw log P(Tfw > t(xd
w )|λw )
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 4 / 33
Information-based Model
Main idea:
1 Discrete terms frequencies x are renormalized into continuousvalues t(x), due to different document length
2 For each term w , values t(x) are assumed to follow a distribution Pwith parameter λw on the corpus, ie Tfw |λw ∼ P
3 Queries and documents are compared with a surprise measure, amean information:
RSV (q, d) =∑w∈q
−xqw log P(Tfw > t(xd
w )|λw )
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 4 / 33
Outline
1 Model PropertiesI Retrieval HeuristicsI Burstiness Phenomenon
2 Two Power-Law InstancesI log-logistic modelI smoothed power-law model
3 Experiments
4 Extension to PRF
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 5 / 33
Notations
xdw frequency of word w in document d , xq
w in querytdw normalized term frequency
Tfw random variable for frequency of word w
ld length of document didfw corpus parameter for word wθ model parameter.
Most (Ad-Hoc) IR models can be written as:
RSV (q, d) =∑w∈q
f (xqw )h(xd
w , ld , idfw , θ)
⇒ What do we know about h?
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 6 / 33
Notations
xdw frequency of word w in document d , xq
w in querytdw normalized term frequency
Tfw random variable for frequency of word wld length of document didfw corpus parameter for word wθ model parameter.
Most (Ad-Hoc) IR models can be written as:
RSV (q, d) =∑w∈q
f (xqw )h(xd
w , ld , idfw , θ)
⇒ What do we know about h?
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 6 / 33
Notations
xdw frequency of word w in document d , xq
w in querytdw normalized term frequency
Tfw random variable for frequency of word wld length of document didfw corpus parameter for word wθ model parameter.
Most (Ad-Hoc) IR models can be written as:
RSV (q, d) =∑w∈q
f (xqw )h(xd
w , ld , idfw , θ)
⇒ What do we know about h?
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 6 / 33
Overview
Information ModelsNormalization
Probability DistributionRSV
Heuristic Constraints
Condition 1Condition 2Condition 3Condition 4
BurstinessPhenomenon
Property of Prob.Distributions
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 7 / 33
Condition 1Docs with more occurrences of query terms get higher scores than docswith less occurrences
∀(l , idf , θ),∂h(x , l , idf , θ)
∂x> 0 (h increases with x)
0 5 10 15
01
23
45
6
x
h(x)
"Good" h: increasing"Bad" h: decreasing
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 8 / 33
Condition 2The increase in the retrieval score should be smaller for larger termfrequencies. Ex: 2→4, 50→ 52
∀(l , idf , θ),∂2h(x , l , idf , θ)
∂x2< 0 (h concave)
0 5 10 15
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
x
h(x)
"Good" h: Concave"Bad" h: Convex
Difference of scores decreases
Difference of scores increases
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 9 / 33
Condition 3
Longer documents, when compared to shorter ones with exactly thesame number of occurrences of query terms, should be penalized(likely to cover additional topics)
∀(x , idf , θ),∂h(x , l , idf , θ)
∂l< 0 (h decreasing with l)
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 10 / 33
Condition 4: IDF EffectIt is important to downweight terms occurring in many documents
∀(x , l , θ),∂h(x , l , idf , θ)
∂idf> 0 (IDF Effect)
0 5 10 15
1.6
1.8
2.0
2.2
2.4
2.6
2.8
3.0
x
h(x)
h(x,IDF=10)h(x,IDF=5)
IDF Effect: h(x,IDF=10)>h(x,IDF=5)
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 11 / 33
Heuristic Constraints
Condition 1: h increases with x
Condition 2: h is concave
Condition 3: h decreases with l
Condition 4: h increases with idf (IDF Effect)
Additionnal conditions in the paper
⇒ Analytical Reformulation of TFC1, TFC2, LNC1 and TDC:
Fang et al, A Formal Study of Information Retrieval Heuristics, SIGIR’04
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 12 / 33
Heuristic Constraints
Condition 1: h increases with x
Condition 2: h is concave
Condition 3: h decreases with l
Condition 4: h increases with idf (IDF Effect)
Additionnal conditions in the paper
⇒ Analytical Reformulation of TFC1, TFC2, LNC1 and TDC:
Fang et al, A Formal Study of Information Retrieval Heuristics, SIGIR’04
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 12 / 33
Overview
Information ModelsNormalization
Probability DistributionRSV
Heuristic Constraints
Condition 1Condition 2Condition 3Condition 4
BurstinessPhenomenon
Property of Prob.Distributions
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 13 / 33
Burstiness Phenomenon
We proceed to Word Frequency distributions:
Church and Gale 1 showed that a 2-Poisson model yields a poor fit toword frequencies
A possible explanation: the behavior of words which tend to appear inbursts, ie burstiness
Once a word appears in a document, it is much more likely to appearagain
Recent works on Dirichlet Coumpound Multinomial
⇒ Which distributions can account for burstiness?
1Poisson MixturesS.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 14 / 33
Burstiness Phenomenon
We proceed to Word Frequency distributions:
Church and Gale 1 showed that a 2-Poisson model yields a poor fit toword frequencies
A possible explanation: the behavior of words which tend to appear inbursts, ie burstiness
Once a word appears in a document, it is much more likely to appearagain
Recent works on Dirichlet Coumpound Multinomial
⇒ Which distributions can account for burstiness?
1Poisson MixturesS.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 14 / 33
Burstiness Property of Probabilility Distribution
Definition
A distribution P is bursty iff the function gε defined by:
gε(x) = P(X ≥ x + ε|X ≥ x)
is a strictly increasing function of x ( ∀ε > 0)
Interpretation: it becomes easier to generate more occurrences