Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangy uan Wu, Guihong Cao, Dependence La nguage Model for Information Retri eval, SIGIR 2004
Jan 19, 2016
Dependence Language Model for Information Retrieval
Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Infor
mation Retrieval, SIGIR 2004
Reference
• Structure and performance of a dependency language model. Ciprian, David Engle and et al. Eurospeech 1997.
• Parsing English with a Link Grammar. Daniel D. K. Sleator and Davy Temperley. Technical Report CMU-CS-91-196 1991.
Why we use independence assumption?
• The independence assumption is one of the assumptions widely adopted in probabilistic retrieval theory.
• Why?– Make retrieval models easier.
– Make retrieval operation tractable.
• The shortage of independence assumption– Independence assumption does not hold in textual data.
Latest ideas of dependence assumption
• Bigram– Some language modeling approach try to incorporate word frequency b
y using bigram.
– Shortage:
• Some of word dependencies not only exist between adjacent words but also exist at more distant.
• Some of adjacent words are not exactly connected.
– Bigam language model showed only marginally better effectiveness than the unigram model.
• Bi-term– Bi-term language model is similar to the bigram model except the const
raint of order in terms is relaxed.
– “information retrieval” and “retrieval of information” will be assigned the same probability of generating the query.
Structure and performance of a dependency language model
Introduction
• This paper present a maximal entropy language model that incorporates both syntax and semantics via a dependency grammar.
• Dependency grammar: express the relations between words by a directed graph which can incorporate the predictive power of words that lie outside of bigram or trigram range.
Introduction
• Why we use Ngram– Assume
if we want to record
we need to store independent parameters
• The drawback of Ngram– Ngram blindly discards relevant words that lie N or more positions in t
he past.
nwwwwS ...,,, 210)...|()...|()()( 10010 nn wwwPwwPwPSP
)...|( 10 nn wwwP
)1(1 VV i
Structure of the model
Structure of the model
• Develop an expression for the joint probability , K is the linkages in the sentence.
• Then we get
• Assume that the sum is dominated by a single term, then
),( KSP
K
KSPSP ),()(
),(maxarg
),(),(*
*
KSPKwhere
KSPKSP
K
K
),()( *KSPSP
A dependency language model of IR
• A query we want to rank – Previous work:
• Assume independence between query terms :
– New work:
• Assume that term dependencies in a query form a linkage
)...( 1 mqqQ )|( DQP
mi i DqPDQP...1
)|()|(
L L
DLQPDLPDLQPDQP ),|()|()|,()|(
A dependency language model of IR
• Assume that the sum over all the possible Ls is dominated by a single term
• Assume that each term is dependent on exactly one related query term generated previous.
L L
DLQPDLPDLQPDQP ),|()|()|,()|(
L
DLQP )|,(
*L
)|(maxarg
),|()|()|(
DLPLthatsuch
DLQPDLPDQP
L
hq iq jq
A dependency language model of IR
Lji ji
jijh
Lji i
ijh
Ljiijh
DLqPDLqP
DLqPDLqqPDqP
DLqP
DLqqPDqP
DLqqPDqPDLQP
),(
),(
),(
),|(),|(
),|(),|,()|(
),|(
),|,()|(
),,|()|(),|(
)|(maxarg
),|()|()|(
DLPLthatsuch
DLQPDLPDQP
L
hq iq jq
A dependency language model of IR
• Assume– The generation of a single term is independent of L
• By this assumption, we would have arrived at the same result by starting from any term. L can be represented as an undirected graph.
)|(),|( DqPDLqP jj
mi Lji ji
jii
hj Lji ji
jijh
DqPDqP
DLqqPDqP
DqPDqP
DLqqPDqPDqPDLQP
...1 ),(
),(
)|()|(
),|,()|(
)|()|(
),|,()|()|(),|(
Lji ji
jijh DLqPDLqP
DLqPDLqqPDqP
),( ),|(),|(
),|(),|,()|(
A dependency language model of IR
)|(maxarg
),|()|()|(
DLPLthatsuch
DLQPDLPDQP
L
)|()|(
),|,(log),|,(
),|,()|(log)|(log)|(log...1 ),(
DqPDqP
DLqqPDLqqMI
DLqqMIDqPDLPDQP
ji
jiji
mi Ljijii
取 log
mi Lji ji
jii DqPDqP
DLqqPDqPDLQP
...1 ),( )|()|(
),|,()|(),|(
Parameter Estimation
• Estimating– Assume that the linkages are independent.
– Then count the relative frequency of link l between and given that they appear in the same sentence.
)|( DLP
Ll
DlPDLP )|()|(
),(
),,(),|(
ji
jiji qqC
RqqCqqRF
mi Lji
jii DLqqMIDqPDLPDQP...1 ),(
),|,()|(log)|(log)|(log
iq jq
Have a link in a sentence
in training dataA score
)|(),|(
),|(
),(
QlPqqRF
qqRF
ljiji
ji
The link frequency of
query i and query j
Parameter Estimation
),|()|( ji qqRFQlP
Lji
jiLL
qqRFQLPL),(
),|(maxarg)|(maxarg
Ll Lji
ji qqRFDlPDQLPDLP),(
),|()|(),|()|(
)|(),|(
),|(
),(
QlPqqRF
qqRF
ljiji
ji
assumption
),|(),|()1(),|( jiCjiDji qqRFqqRFqqRF
Assumption: )|()|( DLPQLP
Parameter Estimation
• Estimating– The document language model is smoothed with a Dirichlet prior
)|( DqP i
ii qiC
iC
qiC
iCiD
iii
qC
qC
qC
qCqC
CqPDqPDqP
)(
)(
)(
)()()1(
)|()|()1()|('
Dirichilet distribution
Constant discount
Parameter Estimation
• Estimating ),|,( DLqqMI ji
),(*,),*,(
),,(log
)),(*,)(),*,((
),,(log
),|(),|(
),|,(log),|,(
RqCRqC
NRqqC
NRqCNRqC
NRqqC
DLqPDLqP
DLqqPDLqqMI
jDiD
jiD
jDiD
jiD
ji
jiji
)(*,*, RCN D
Experimental Setting
• Stemmed and stop words were removed.
• Queries are TREC topics 202 to 250 on TREC disk 2 and 3.
The flow of the experimental
Find the linkage of query
query
Find the max L by maxlP(l|Q)
Get
document Training data For weight computation
Count the frequency
),|( and
),|(
jiC
jiD
qqRF
qqRF
),|( ji qqRF
Get P(L|D)
Count the frequency
)( and )( iDiC qCqC Get )|( DqP i
Count the frequency
)(*,*, and
),,( and
),*,( and ),*,(
RC
RqqC
RqCRqC
D
jiD
iDiD Get ),|,( DLqqMI ji
combine Ranking
document
Result-BM & UG
• BM: binary independent retrieval
• UG: unigram language model approach
• UG achieves the performance similar to, or worse than, that of BM.
Result- DM
• DM: dependency model
• The improve of DM over UG is statistically significant.
Result- BG
• BG: bigram language model
• BG is slightly worse than DM in five out of six TREC collections but substantially outperforms UG in all collection.
Result- BT1 & BT2
• BT: bi-term language model
)),|(),|((2
1),|( 1111 DqqPDqqPDqqP iiBGiiBGiiBT
)}(),(min{2
),(),(),|(
1
1112
iDiD
iiDiiDiiBT qCqC
qqCqqCDqqP
Conclusion
• This paper introduce the linkage of a query as a hidden variable.
• Generate each term in turn depending on other related terms according to the linkage.– This approach cover several language model approaches as special case
s.
• The experimental of this paper outperforms substantially over unigram, bigram and classical probabilistic retrieval model.