A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.

A Markov Random Field Model for Term Dependencies

Donald Metzler W. Bruce Croft

Present by Chia-Hao Lee

2

outline

• Introduction• Model

– Overview– Variants– Potential Functions– Training

• Experimental Results• Conclusions

3

Introduction

• There is rich history of statistical models for information retrieval, including the binary independence model (BIM), language modeling, inference network model, and so on.

• It is well known that dependencies exist between terms in a collection of text.

• For example, with a SIGIR proceeding, occurrences of certain pairs of terms are correlated, such as information and retrieval.

4

Introduction

• Unfortunately, estimating statistical models for general term dependencies is infeasible, due to data sparsity.

• For this reason, most retrieval models assume some form of independence exists between terms.

• Most work on modeling term dependencies in the past has focused on phrases/proximity or term co-occurrences. Most of these models only consider dependencies between pairs of terms.

• Several recent studies have examined term dependence models for the language modeling framework.

5

Model

• Markov random fields (MRF), also called undirected graphical models, are commonly used in the statistical machine learning domain to succinctly model joint distributions.

• We use MRFs to model the joint distribution over queries Q and documents D, parameterized by Λ.

DQP ,

6

Model

• A markov random field is constructed from a graph G.• The nodes in the graph represent random variables, and

the edges define the independence semantics between the random variables.

• In this model, we assume G consists of query nodes and a document node D, such as the graphs in the figure.

GCc

cZ

DQP ;1

,

nqqQ ,,1

GC : the set of cliques in G

; : a non-negative potential function over clique configurations parameterized by Λ

DQ GCc

cZ,

; :normalizes the distribution

7

Model

• For ranking purposes we compute the posterior:

• As noted above, all potential functions must be non-negative, and are must commonly parameterized as:

GCc

rank

rank

c

QPDQP

QP

DQPQDP

;log

log,log

,

cfc c exp; cf : real-valued feature function over clique values

c : the weight given to that particular feature function

8

Model

• Substituting this back into ranking function, we end up with the following ranking function

• To utilize the model, the following steps must be taken for each query Q:– Construct a graph representing the query term dependencies to

model – Define a set of potential functions over the cliques of this graph– Rank documents in descending order of

1

GCc

c

rank

cfQDP

QDP

9

Model

• We now describe and analyze three variants of the MRF model, each with different underlying dependence assumptions.– Full independence (FI)– Sequential dependence (SD)– Full dependence (FD)

10

Model

• The full independence variant makes the assumption that query terms are independent given some document D.

• The likelihood of query term occurring is not affected by the occurrence of any other query term, or more succinctly,

.

• The sequential dependence variant assumes a dependence between neighboring query terms.

• Formally, this assumption states that only for nodes that are not adjacent to .

iq

iq

DqPqDqP iiji ,

DqPqDqP iji ,

iqjq

11

Model

• The full dependence variant, all query terms are in some way dependent on each other.

• Graphically, a query of length n translates into the complete graph , which includes edges from each query node to the document node D.

1nK

12

Model

• The potential functions φ play a very important role in how accurate our approximation of the true joint distribution is.

• For example : Consider a document D on the topic of information retrieval.

Using the sequential dependence variant, we would expect

, as the term

information and retrieval are much more “compatible” with the topicality

of document D than the terms information and assurance.

Dassurance,n,informatioDretrieval,n,informatio

13

Model

• Since documents are ranked by Equation 1, it is also important that the potential functions can be computed efficiently.

• Based on these criteria and previous research on phases and term dependence, we focus on three types of potential functions.

• These potential functions are attempt to abstract the idea of term co-occurrence.

14

Model

• Since potentials are over cliques in the graph, we now proceed to enumerate all of the possible ways graph cliques are formed in our model and how potential functions are defined for each.

• The simplest type of clique that can appear in our graph is a 2-clique consisting of an edge between a query term and the document D.

iq

15

Model

• In keeping with simple to compute measures, we define this potential as:

C

cf

D

tf

DqPc

ii qD

DqDT

iTT

,1log

log

DqP i : a smoothed language modeling estimate

Dwtf , : the number of the terms w occurs in document D

wcf : the number of times term w occurs in the entire collection

D : total number of terms in the document D

C : the length of the collection

16

Model

• Next, we consider cliques that contain two or more query terms.

• For example: In the query train station security measures, if any of the sub-phrases,

train station, train station security, station security measures, or

security measures appear in a document then there is strong

evidence in favor of relevance.

17

Model

• Therefore, for every clique that contains a contiguous set of two or more terms and the document node D,

we apply the following “ordered” potential function:

C

cf

D

tf

DqqPc

kiikii qqD

DqqDO

kiiOO

,,1#,,,1#1log

,,1#log

kii qqcf1# : the number of times term ω occurs in the entire collection



kii qq ,,

Dqq kiitf ,1# : the number of the times the exact phrase occurs in document D kii qq ,,

18

Model

• Although the occurrence of contiguous sets of query terms provide strong evidence of relevance, it is also the case that the occurrence of non-contiguous sets of query terms can provide valuable evidence.

• In the previous example, documents containing the terms train and security within some short proximity of one another also provide additional evidence towards relevance.

19

Model

• For our purposes, we construct an “unordered” potential function over cliques that consist of sets of two or more query terms and the document node D. Such potential functions have the following from:

C

cf

D

tf

DqquwNPc

jiji qquwN

D

DqquwN

DU

jiUU

,,#,,,#1log

,,#log

DqquwN jitf ,#

: the number of the times the terms appear ordered or unordered with a window N terms.

ji qquwNcf # : the number of times term ω occurs in the entire collection



ji qq ,,

20

Model

• Using these potential functions, we derive the following specific ranking function:

UOcUU

OcOO

TcTT

GCcc

rank

cfcfcf

cfQDP

21

Experimental Results

• We make use of the Associated Press and Wall Street Journal sub-collections of TREC, which are small homogeneous collections, and two web collections, WT10g and GOV2, which are considerably larger and less homogeneous.

22


• Full independence

23


• Sequential dependence

24


• Full dependence

25

Conclusions

• In this paper, we develop as general term dependence model that can make use of arbitrary text feature.

• Three variants of the model are described, where each capture different dependencies between query terms.

26

Markov Random Fields

• Let be random variables taking values in some finite set S, and let be a finite graph such that , whose elements will sometime be called sites.

• For a set , let define its neighbor (or boundary) set: all elements in that have a neighbor in A. For

, let .

• The random variables are said to define a Markov random field if, for any vector :

nXX ,,1 ENG ,

NN ,,1

NA AAN \

Ni ii

NSx

ijxXxXiNjxXxX jjiijjii ,Pr\,Pr

27

Potentials

• A potential is a function indexed by subsets of N on the space . We will write potentials as for , .

• Given a full set of potentials, the energy of a configuration w will be defined as:

• Using the energy, we can define a probability measure, P, from a set of potentials by:

NS NA wVANSw

NA

A wVwU

Z

wUwP

exp

NSw

wUZ exp

A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.

Documents

query term dependencies

term dependence models

query nodes

neighboring query terms

query q

mrf model

modeling term dependencies

retrieval models