Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

Dependence Language Model for Information Retrieval

Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Infor

mation Retrieval, SIGIR 2004

Reference

• Structure and performance of a dependency language model. Ciprian, David Engle and et al. Eurospeech 1997.

• Parsing English with a Link Grammar. Daniel D. K. Sleator and Davy Temperley. Technical Report CMU-CS-91-196 1991.

Why we use independence assumption?

• The independence assumption is one of the assumptions widely adopted in probabilistic retrieval theory.

• Why?– Make retrieval models easier.

– Make retrieval operation tractable.

• The shortage of independence assumption– Independence assumption does not hold in textual data.

Latest ideas of dependence assumption

• Bigram– Some language modeling approach try to incorporate word frequency b

y using bigram.

– Shortage:

• Some of word dependencies not only exist between adjacent words but also exist at more distant.

• Some of adjacent words are not exactly connected.

– Bigam language model showed only marginally better effectiveness than the unigram model.

• Bi-term– Bi-term language model is similar to the bigram model except the const

raint of order in terms is relaxed.

– “information retrieval” and “retrieval of information” will be assigned the same probability of generating the query.

Structure and performance of a dependency language model

Introduction

• This paper present a maximal entropy language model that incorporates both syntax and semantics via a dependency grammar.

• Dependency grammar: express the relations between words by a directed graph which can incorporate the predictive power of words that lie outside of bigram or trigram range.

Introduction

• Why we use Ngram– Assume

if we want to record

we need to store independent parameters

• The drawback of Ngram– Ngram blindly discards relevant words that lie N or more positions in t

he past.

nwwwwS ...,,, 210)...|()...|()()( 10010 nn wwwPwwPwPSP

)...|( 10 nn wwwP

)1(1 VV i

Structure of the model

Structure of the model

• Develop an expression for the joint probability , K is the linkages in the sentence.

• Then we get

• Assume that the sum is dominated by a single term, then

),( KSP

K

KSPSP ),()(

),(maxarg

),(),(*

*

KSPKwhere

KSPKSP

K

K

),()( *KSPSP

A dependency language model of IR

• A query we want to rank – Previous work:

• Assume independence between query terms :

– New work:

• Assume that term dependencies in a query form a linkage

)...( 1 mqqQ )|( DQP

mi i DqPDQP...1

)|()|(

L L

DLQPDLPDLQPDQP ),|()|()|,()|(


• Assume that the sum over all the possible Ls is dominated by a single term

• Assume that each term is dependent on exactly one related query term generated previous.

L L

DLQPDLPDLQPDQP ),|()|()|,()|(

L

DLQP )|,(

*L

)|(maxarg

),|()|()|(

DLPLthatsuch

DLQPDLPDQP

L

hq iq jq


Lji ji

jijh

Lji i

ijh

Ljiijh

DLqPDLqP

DLqPDLqqPDqP

DLqP

DLqqPDqP

DLqqPDqPDLQP

),(

),(

),(

),|(),|(

),|(),|,()|(

),|(

),|,()|(

),,|()|(),|(

)|(maxarg

),|()|()|(

DLPLthatsuch

DLQPDLPDQP

L

hq iq jq


• Assume– The generation of a single term is independent of L

• By this assumption, we would have arrived at the same result by starting from any term. L can be represented as an undirected graph.

)|(),|( DqPDLqP jj

mi Lji ji

jii

hj Lji ji

jijh

DqPDqP

DLqqPDqP

DqPDqP

DLqqPDqPDqPDLQP

...1 ),(

),(

)|()|(

),|,()|(

)|()|(

),|,()|()|(),|(

Lji ji

jijh DLqPDLqP

DLqPDLqqPDqP

),( ),|(),|(

),|(),|,()|(


)|(maxarg

),|()|()|(

DLPLthatsuch

DLQPDLPDQP

L

)|()|(

),|,(log),|,(

),|,()|(log)|(log)|(log...1 ),(

DqPDqP

DLqqPDLqqMI

DLqqMIDqPDLPDQP

ji

jiji

mi Ljijii

取 log

mi Lji ji

jii DqPDqP

DLqqPDqPDLQP

...1 ),( )|()|(

),|,()|(),|(

Parameter Estimation

• Estimating– Assume that the linkages are independent.

– Then count the relative frequency of link l between and given that they appear in the same sentence.

)|( DLP

Ll

DlPDLP )|()|(

),(

),,(),|(

ji

jiji qqC

RqqCqqRF

mi Lji

jii DLqqMIDqPDLPDQP...1 ),(

),|,()|(log)|(log)|(log

iq jq

Have a link in a sentence

in training dataA score

)|(),|(

),|(

),(

QlPqqRF

qqRF

ljiji

ji

The link frequency of

query i and query j


),|()|( ji qqRFQlP

Lji

jiLL

qqRFQLPL),(

),|(maxarg)|(maxarg

Ll Lji

ji qqRFDlPDQLPDLP),(

),|()|(),|()|(

)|(),|(

),|(

),(

QlPqqRF

qqRF

ljiji

ji

assumption

),|(),|()1(),|( jiCjiDji qqRFqqRFqqRF

Assumption: )|()|( DLPQLP


• Estimating– The document language model is smoothed with a Dirichlet prior

)|( DqP i

ii qiC

iC

qiC

iCiD

iii

qC

qC

qC

qCqC

CqPDqPDqP

)(

)(

)(

)()()1(

)|()|()1()|('

Dirichilet distribution

Constant discount


• Estimating ),|,( DLqqMI ji

),(*,),*,(

),,(log

)),(*,)(),*,((

),,(log

),|(),|(

),|,(log),|,(

RqCRqC

NRqqC

NRqCNRqC

NRqqC

DLqPDLqP

DLqqPDLqqMI

jDiD

jiD

jDiD

jiD

ji

jiji

)(*,*, RCN D

Experimental Setting

• Stemmed and stop words were removed.

• Queries are TREC topics 202 to 250 on TREC disk 2 and 3.

The flow of the experimental

Find the linkage of query

query

Find the max L by maxlP(l|Q)

Get

document Training data For weight computation

Count the frequency

),|( and

),|(

jiC

jiD

qqRF

qqRF

),|( ji qqRF

Get P(L|D)

Count the frequency

)( and )( iDiC qCqC Get )|( DqP i

Count the frequency

)(*,*, and

),,( and

),*,( and ),*,(

RC

RqqC

RqCRqC

D

jiD

iDiD Get ),|,( DLqqMI ji

combine Ranking

document

Result-BM & UG

• BM: binary independent retrieval

• UG: unigram language model approach

• UG achieves the performance similar to, or worse than, that of BM.

Result- DM

• DM: dependency model

• The improve of DM over UG is statistically significant.

Result- BG

• BG: bigram language model

• BG is slightly worse than DM in five out of six TREC collections but substantially outperforms UG in all collection.

Result- BT1 & BT2

• BT: bi-term language model

)),|(),|((2

1),|( 1111 DqqPDqqPDqqP iiBGiiBGiiBT

)}(),(min{2

),(),(),|(

1

1112

iDiD

iiDiiDiiBT qCqC

qqCqqCDqqP

Conclusion

• This paper introduce the linkage of a query as a hidden variable.

• Generate each term in turn depending on other related terms according to the linkage.– This approach cover several language model approaches as special case

s.

• The experimental of this paper outperforms substantially over unigram, bigram and classical probabilistic retrieval model.

Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

Documents

term language model

dependence language

bigam language model

bigram model

maximal entropy language

unigram model

dependency grammar

retrieval of information