Top Banner
2007.02.15 - SLIDE 1 IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00 pm Spring 2007 http://courses.ischool.berkeley.edu/i240/s07 Principles of Information Retrieval
67

2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 1IS 240 – Spring 2007

Lecture 10: Inference and Belief Networks

Prof. Ray Larson University of California, Berkeley

School of InformationTuesday and Thursday 10:30 am - 12:00 pm

Spring 2007http://courses.ischool.berkeley.edu/i240/s07

Principles of Information Retrieval

Page 2: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 2IS 240 – Spring 2007

Today

• Term Papers and Mini-TREC directory Organization

• Review– Probabilistic Models and Logistic Regression

• Information Retrieval using inference networks– Bayesian networks– Turtle and Croft Inference Model

Page 3: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 3IS 240 – Spring 2007

Term Paper

• Should be about 8-12 pages on:– some area of IR research (or practice) that you are

interested in and want to study further– OR Experimental tests of systems or IR algorithms

• Mini-TREC alone would not qualify, but some set of related experiments might – check with me

– OR Build an IR system, test it, and describe the system and its performance

• If you are building your own you can use it for both Mini-TREC and the paper

• Due May 16th (Monday of Finals Week)

Page 4: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 4IS 240 – Spring 2007

Mini-TREC

• Proposed Schedule– February 15 – Database and previous

Queries– February 27 – report on system acquisition

and setup– March 8, New Queries for testing…– April 19, Results due– April 24 or 26, Results and system rankings– May 8 Group reports and discussion

Page 5: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 5IS 240 – Spring 2007

MiniTREC data and queries

• Data is a subset (one collection of TREC data)

• Restricted

Page 6: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 6IS 240 – Spring 2007

MiniTREC data and queries

• Example TREC query:<top><num> Number: 252 <title> Topic: Combating Alien Smuggling

<desc> Description: What steps are being taken by governmental oreven private entities world-wide to stop the smuggling of aliens.

<narr> Narrative: To be relevant, a document must describe an effort being made (other than routine border patrols) in any country of the world to prevent the illegal penetration of aliens across borders.

</top>

Notice that this is NOT XML, it is SGML with “implied endtags” for the major tags

Page 7: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 7IS 240 – Spring 2007

FT database records…

The documents ARE in XML (actually still SGML – note however no higher groupings in file)<DOC><DOCNO>FT911-4</DOCNO><PROFILE>_AN-BEOA7AAHFT</PROFILE><DATE>910514</DATE><HEADLINE>FT 14 MAY 91 / World News in Brief: Population warning</HEADLINE><TEXT>The world's population is growing faster than predicted and will consume atan unprecedented rate the natural resources required for human survival, aUN report said.</TEXT><PUB>The Financial Times</PUB><PAGE>International Page 1</PAGE></DOC>

Page 8: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 8IS 240 – Spring 2007

Review: IR Models

• Set Theoretic Models– Boolean– Fuzzy– Extended Boolean

• Vector Models (Algebraic)

• Probabilistic Models (probabilistic)

Page 9: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 9IS 240 – Spring 2007

Review

• Probabilistic Models– Probabilistic Indexing (Model 1)– Probabilistic Retrieval (Model 2)– Unified Model (Model 3)– Model 0 and real-world IR– Regression Models– The “Okapi Weighting Formula”

Page 10: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 10IS 240 – Spring 2007

Model 1

• A patron submits a query (call it Q) consisting of some specification of her/his information need. Different patrons submitting the same stated query may differ as to whether or not they judge a specific document to be relevant. The function of the retrieval system is to compute for each individual document the probability that it will be judged relevant by a patron who has submitted query Q.

Robertson, Maron & Cooper, 1982

Page 11: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 11IS 240 – Spring 2007

Model 1 Bayes

• A is the class of events of using the system• Di is the class of events of Document i being

judged relevant• Ij is the class of queries consisting of the

single term Ij

• P(Di|A,Ij) = probability that if a query is submitted to the system then a relevant document is retrieved

)|(

),|()|(),|(

AIP

DAIPADPIADP

j

ijiji

Page 12: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 12IS 240 – Spring 2007

Model 2

• Documents have many different properties; some documents have all the properties that the patron asked for, and other documents have only some or none of the properties. If the inquiring patron were to examine all of the documents in the collection she/he might find that some having all the sought after properties were relevant, but others (with the same properties) were not relevant. And conversely, he/she might find that some of the documents having none (or only a few) of the sought after properties were relevant, others not. The function of a document retrieval system is to compute the probability that a document is relevant, given that it has one (or a set) of specified properties.

Robertson, Maron & Cooper, 1982

Page 13: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 13IS 240 – Spring 2007

Model 2 – Robertson & Sparck Jones

Document Relevance

Documentindexing

Given a term t and a query q

+ -

+ r n-r n

- R-r N-n-R+r N-n

R N-R N

Page 14: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 14IS 240 – Spring 2007

Robertson-Spark Jones Weights

• Retrospective formulation --

rRnNrnrR

r

log

Page 15: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 15IS 240 – Spring 2007

Robertson-Sparck Jones Weights

• Predictive formulation

5.05.05.0

5.0

log)1(

rRnNrnrR

r

w

Page 16: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 16IS 240 – Spring 2007

Probabilistic Models: Some Unifying Notation

• D = All present and future documents

• Q = All present and future queries

• (Di,Qj) = A document query pair

• x = class of similar documents,

• y = class of similar queries,

• Relevance is a relation:

}Q submittinguser by therelevant judged

isDdocument ,Q ,D | )Q,{(D R

j

ijiji QD

Dx

Qy

Page 17: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 17IS 240 – Spring 2007

Probabilistic Models

• Model 1 -- Probabilistic Indexing, P(R|y,Di)

• Model 2 -- Probabilistic Querying, P(R|Qj,x)

• Model 3 -- Merged Model, P(R| Qj, Di)

• Model 0 -- P(R|y,x)

• Probabilities are estimated based on prior usage or relevance estimation

Page 18: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 18IS 240 – Spring 2007

Probabilistic Models

QD

x

y

Di

Qj

Page 19: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 19IS 240 – Spring 2007

Logistic Regression

• Based on work by William Cooper, Fred Gey and Daniel Dabney.

• Builds a regression model for relevance prediction based on a set of training data

• Uses less restrictive independence assumptions than Model 2– Linked Dependence

Page 20: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 20IS 240 – Spring 2007

Dependence assumptions

• In Model 2 term independence was assumed– P(R|A,B) = P(R|A)P(R|B)– This is not very realistic as we have discussed before

• Cooper, Gey, and Dabney proposed linked dependence: – If two or more retrieval clues are statistically

dependent in the set of all relevance-related query-document pairs then they are statistically dependent to a corresponding degree in the set of all nonrelevance-related pairs.

– Thus dependency in the relevant and nonrelevant documents is linked

Page 21: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 21IS 240 – Spring 2007

Linked Dependence

• Linked Dependence Assumption: there exists a positive real number K such that the following two conditions hold:– P(A,B|R) = K P(A|R) P(B|R)– P(A,B|R) = K P(A|R) P(B|R)– When K=1 this is the same as binary

independence

N

i i

i

N

N

RAP

RAP

RAAP

RAAP

11

1

)|(

)|(

)|,...,(

)|,...,(

Page 22: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 22IS 240 – Spring 2007

Linked Dependence

• The Odds of an event E : O(E) = P(E)/P(E)

• (See paper for details)

• Multiplying by O(R) and taking logs we get:

N

i

iN

RO

ARO

RO

AARO

1

1

)(

)|(

)(

),...,|(

N

iiN ROAROROAARO

11 )](log)|([log)(log),...,|(log

Page 23: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 23IS 240 – Spring 2007

So What’s Regression?

• A method for fitting a curve (not necessarily a straight line) through a set of points using some goodness-of-fit criterion

• The most common type of regression is linear regression

Page 24: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 24IS 240 – Spring 2007

What’s Regression?

• Least Squares Fitting is a mathematical procedure for finding the best fitting curve to a given set of points by minimizing the sum of the squares of the offsets ("the residuals") of the points from the curve

• The sum of the squares of the offsets is used instead of the offset absolute values because this allows the residuals to be treated as a continuous differentiable quantity

Page 25: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 25IS 240 – Spring 2007

Logistic Regression

100 -

90 -

80 -

70 -

60 -

50 -

40 -

30 -

20 -

10 -

0 - 0 10 20 30 40 50 60Term Frequency in Document

Rel

evan

ce

Page 26: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 26IS 240 – Spring 2007

Probabilistic Models: Logistic Regression

• Estimates for relevance based on log-linear model with various statistical measures of document content as independent variables

nnkji vcvcvcctdR|qO ...),,(log 22110

)),|(log(1

1),|(

ji dqROjie

dqRP

m

kkjiji ROtdqROdqRO

1, )](log),|([log),|(log

Log odds of relevance is a linear function of attributes:

Term contributions summed:

Probability of Relevance is inverse of log odds:

Page 27: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 27IS 240 – Spring 2007

Logistic Regression Attributes

MX

n

nNIDF

IDFM

X

DLX

DAFM

X

QLX

QAFM

X

j

j

j

j

j

t

t

M

t

M

t

M

t

log

log1

log1

log1

6

15

4

13

2

11

Average Absolute Query Frequency

Query Length

Average Absolute Document Frequency

Document Length

Average Inverse Document Frequency

Inverse Document Frequency

Number of Terms in common between query and document -- logged

Page 28: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 28IS 240 – Spring 2007

Logistic Regression

• Probability of relevance is based on Logistic regression from a sample set of documents to determine values of the coefficients

• At retrieval the probability estimate is obtained by:

• For the 6 X attribute measures shown previously

6

10),|(

iii XccDQRP

Page 29: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 29IS 240 – Spring 2007

Logistic Regression and Cheshire II

• The Cheshire II system uses Logistic Regression equations estimated from TREC full-text data

• In addition, an implementation of the Okapi BM-25 algorithm has been included also

• Demo (?)

Page 30: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 30IS 240 – Spring 2007

Current use of Probabilistic Models

• Many of the major systems in TREC now use the “Okapi BM-25 formula” (or Language Models -- more on those later) which incorporates the Robertson-Sparck Jones weights…

5.05.05.0

5.0

log)1(

rRnNrnrR

r

w

Page 31: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 31IS 240 – Spring 2007

Okapi BM-25

• Where:• Q is a query containing terms T• K is k1((1-b) + b.dl/avdl)• k1, b and k3 are parameters , usually 1.2, 0.75 and 7-1000• tf is the frequency of the term in a specific document• qtf is the frequency of the term in a topic from which Q was

derived• dl and avdl are the document length and the average

document length measured in some convenient unit (e.g. bytes)

• w(1) is the Robertson-Sparck Jones weight.

QT qtfk

qtfk

tfK

tfkw

3

31)1( )1()1(

Page 32: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 32IS 240 – Spring 2007

Probabilistic Models

• Strong theoretical basis

• In principle should supply the best predictions of relevance given available information

• Can be implemented similarly to Vector

• Relevance information is required -- or is “guestimated”

• Important indicators of relevance may not be term -- though terms only are usually used

• Optimally requires on-going collection of relevance information

Advantages Disadvantages

Page 33: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 33IS 240 – Spring 2007

Vector and Probabilistic Models

• Support “natural language” queries

• Treat documents and queries the same

• Support relevance feedback searching

• Support ranked retrieval

• Differ primarily in theoretical basis and in how the ranking is calculated– Vector assumes relevance – Probabilistic relies on relevance judgments or

estimates

Page 34: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 34IS 240 – Spring 2007

Today

• Papers and (Mini-INEX Organization ?)

• Review– Probabilistic Models and Logistic Regression

• Information Retrieval using inference networks– Bayesian networks– Turtle and Croft Inference Model

Page 35: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 35IS 240 – Spring 2007

Bayesian Network Models

• Modern variations of probabilistic reasoning

• Greatest strength for IR is in providing a framework permitting combination of multiple distinct evidence sources to support a relevance judgement (probability) on a given document.

Page 36: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 36IS 240 – Spring 2007

Bayesian Networks

• A Bayesian network is a directed acyclic graph (DAG) in which the nodes represent random variables and the arcs into a node represents a probabilistic dependence between the node and its parents

• Through this structure a Bayesian network represents the conditional dependence relations among the variables in the network

Page 37: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 37IS 240 – Spring 2007

Bayes’ theorem

)()|()(

)|(Bp

ABpApBAp

Bgiven A ofy probabilit :)|( BApA ofy probabilit :)(Ap

B ofy probabilit :)(Bp

Agiven B ofy probabilit :)|( ABp

For example: A: diseaseB: symptom

I.e., the “a priori probabilities’

Page 38: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 38IS 240 – Spring 2007

Bayes’ Theorem: Application

Box1 Box2

p(box1) = .5P(red ball | box1) = .4P(blue ball | box1) = .6

p(box2) = .5P(red ball | box2) = .5P(blue ball | box2) = .5

...4545454545.055.25.

5.*5.6.*5.5.*5.

box2)|ball luep(box2)p(bbox1)|ball luep(box1)p(bbox2)|ball luep(box2)p(b

ball) blue(pbox2)|ball luep(box2)p(b

ball) blue|box2(p

Toss a fair coin. If it lands head up, draw a ball from box 1;otherwise, draw a ball from box 2. If the ball is blue, what is the probability that it is drawn from box 2?

Page 39: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 39IS 240 – Spring 2007

A drugs manufacturer claims that its roadside drug test will detect the presence of cannabis in the blood (i.e. show positive for a driver who has smoked cannabis in the last 72 hours) 90% of the time. However, the manufacturer admits that 10% of all cannabis-free drivers also test positive. A national survey indicates that 20% of all drivers have smoked cannabis during the last 72 hours.

(i) Draw a complete Bayesian tree for the scenario described above

Bayes Example The following examples are from http://www.dcs.ex.ac.uk/~anarayan/teaching/com2408/)

Page 40: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 40IS 240 – Spring 2007

(ii) One of your friends has just told you that she was recently stopped by the police and the roadside drug test for the presence of cannabis showed positive. She denies having smoked cannabis since leaving university several months ago (and even then she says that she didn’t inhale). Calculate the probability that your friend smoked cannabis during the 72 hours preceding the drugs test.

))'(*)'|(())(*)|((

)(*)|()|(

FPFEPFPFEP

FPFEPEFP

That is, we calculate the probability of your friend having smoked cannabis given that she tested positive. (F=smoked cannabis, E=tests positive)

69.026.0

18.0

08.018.0

18.0

)8.0*1.0()2.0*9.0(

2.0*9.0)|(

EFP

That is, there is only a 31% chance that your friend is telling the truth.

Bayes Example – cont.

Page 41: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 41IS 240 – Spring 2007

New information arrives which indicates that, while the roadside drugs test will now show positive for a driver who has smoked cannabis 99.9% of the time, the number of cannabis-free drivers testing positive has gone up to 20%. Re-draw your Bayesian tree and recalculate the probability to determine whether this new information increases or decreases the chances that your friend is telling the truth.

56.03598.0

1998.0

16.01998.0

1998.0

)8.0*2.0()2.0*999.0(

2.0*999.0)|(

EFP

That is, the new information has increased the chance that your friend is telling the truth by 13%, but the chances still are that she is lying (just).

Bayes Example – cont.

Page 42: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 42IS 240 – Spring 2007

)(

)(*)|()|(

EP

HPHEPEHP The Bayes Theorem example

includes only two events.

Consider a more complex tree/network:

If an event E at a leaf node happens (say, M) and we wish to know whether this supports A, we need to ‘chain’ our Bayesian rule as follows:

P(A,C,F,M)=P(A|C,F,M)*P(C|F,M)*P(F|M)*P(M)

That is, P(X1,X2,…,Xn)=

where Pai= parents(Xi)

n

i

ii PaXP1

)|(

More Complex Bayes

Page 43: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 43IS 240 – Spring 2007

Imagine the following set of rules:

If it is raining or sprinklers are on then the street is wet.

If it is raining or sprinklers are on then the lawn is wet.

If the lawn is wet then the soil is moist.

If the soil is moist then the roses are OK.

Example (taken from IDIS website)

Graph representation of rules

Example (taken from IDIS website)

Page 44: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 44IS 240 – Spring 2007

We can construct conditional probabilities for each (binary) attribute to reflect our knowledge of the world:

(These probabilities are arbitrary.)

Bayesian Networks

Page 45: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 45IS 240 – Spring 2007

The joint probability of the state where the roses are OK, the soil is dry, the lawn is wet, the street is wet, the sprinklers are off and it is raining is:

P(sprinklers=F, rain=T, street=wet, lawn=wet, soil=dry, roses=OK) =

P(roses=OK|soil=dry) * P(soil=dry|lawn=wet) * P(lawn=wet|rain=T, sprinklers=F) * P(street=wet|rain=T, sprinklers=F) * P(sprinklers=F) * P(rain=T) = 0.2*0.1*1.0*1.0*0.6*0.7=0.0084

Page 46: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 46IS 240 – Spring 2007

Calculating probabilities in sequence

Now imagine we are told that the roses are OK. What can we infer about the state of the lawn? That is, P(lawn=wet|roses=OK) and P(lawn=dry|roses=OK)?

We have to work through soil first.

P(roses OK|soil=moist)=0.7;

P(roses OK|soil=dry)=0.2

P(soil=moist|lawn=wet)=0.9; P(soil=dry|lawn=wet)=0.1*

P(soil=dry|lawn=dry)=0.6; P(soil=moist|lawn=dry)=0.4*

P(R, S, L)= P(R) * P(R|S) * P(S|L)

For R=ok, S=moist, L=wet, 1.0*0.7*0.9 = 0.63

For R=ok, S=dry, L=wet, 1.0*0.2*0.1= 0.02

For R=ok, S=moist, L=dry, 1.0*0.7*0.4=0.28

For R=ok, S=dry, L=dry, 1.0*0.2*0.6=0.12

Lawn=wet = 0.63+0.02 = 0.65 (un-normalised)

Lawn=dry = 0.28+0.12 = 0.3 (un-normalised)

That is, there is greater chance that the lawn is wet. *inferred

Page 47: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 47IS 240 – Spring 2007

Problems with Bayes nets

• Loops can sometimes occur with belief networks and have to be avoided.

• We have avoided the issue of where the probabilities come from. The probabilities either are given or have to be learned. Similarly, the network structure also has to be learned. (See http://www.bayesware.com/products/discoverer/discoverer.html)

• The number of paths to explore grows exponentially with each node. (The problem of exact probabilistic inference in Bayes network is NP=hard. Approximation techniques may have to be used.)

Page 48: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 48IS 240 – Spring 2007

Applications

• You have all used Bayes Belief Networks, probably a few dozen times, when you use Microsoft Office! (See http://research.microsoft.com/~horvitz/lum.htm)

• As you have read, Bayesian networks are also used in spam filters

• Another application is IR where the EVENT you want to estimate a probability for is whether a document is relevant for a particular query

Page 49: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 49IS 240 – Spring 2007

Bayesian Networks

LifeExpectancy

Income Race

L

I R

L

I R

The parents of any child node are those considered to bedirect causes of that node.

Page 50: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 50IS 240 – Spring 2007

Inference Networks

• Intended to capture all of the significant probabilistic dependencies among the variables represented by nodes in the query and document networks.

• Give the priors associated with the documents, and the conditional probabilities associated with internal nodes, we can compute the posterior probability (belief) associated with each node in the network

Page 51: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 51IS 240 – Spring 2007

Inference Networks

• The network -- taken as a whole, represents the dependence of a user’s information need on the documents in a collection where the dependence is mediated by document and query representations.

Page 52: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 52IS 240 – Spring 2007

Document Inference Network

r1

d1

t1 t2 tj-1 tj

d2 di-1 di

r2 r3 rk

c1 c2 c3 cm

q1 q2

I

Document nodes

Text RepresentationNodes

ConceptRepresentation

Nodes

DocumentNetwork

Query representations

Document

Query

Information Need

Page 53: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 53IS 240 – Spring 2007

Boolean Nodes

Query

OR

AND

baseball player umpire strike

Input to Boolean Operator in an Inference Network is a “Probability Of Truth” rather than a strict binary.

Page 54: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 54IS 240 – Spring 2007

Formally…

• Ranking of document dj wrt query q…

– How much evidential support the observation of dj provides to query q

)(1)(

)()|()|(

)()|(

)(

)()()(

}1,0{ where),...,,( 21

jj

jjk

kjj

kj

kjj

it

dqPdqP

dPdkPkqP

kdPkdqP

kdqP

kPdqPdqP

kkkkk

Page 55: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 55IS 240 – Spring 2007

Formally…

• Each term contribution to the belief can be computed separately…

)|(1)|( where

)|(1)|(

)()|()|(

)|()|(

1)(| 0)(|

jiji

jj

jkgi kgi

jiji

kj

dkPdkP

dqPdqP

dPdkPdkP

kqPdqP

i i

Page 56: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 56IS 240 – Spring 2007

With Boolean

)|(1)|(

otherwise 0

1)( if 1{)|(

)(1)(

1)(

jiji

jiji

jj

j

dkPdkP

dgdkP

dPdP

NdP

prior probability of observing documentassumes uniform distribution

I.e. when document dj is observed only the nodes associated with with the index terms are active (have non-zero probability)

Page 57: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 57IS 240 – Spring 2007

Boolean weighting…

• Where qcc and qdnf are conjunctive components and the disjunctive normal form query

)|(1)|(

otherwise 0

))()(,()(| if 1{)|(

kqPkqP

qgkgkqqqkqP cciiidnfcccc

Page 58: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 58IS 240 – Spring 2007

Vector Components

)|(1)|(

max)|(

..frequency. normalized component, tfFor the

)(1)(

||

1)(

..documents. of iesprobabilitprior For

,

,,

jiji

jii

jijiji

jj

j

j

dkPdkP

freq

freqfdkP

dPdP

ddP

From Baeza-Yates, Modern IR

Page 59: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 59IS 240 – Spring 2007

Vector Components

ii

ii

iii

jijii

n

Nidf

kqPkqP

qgkk

qgkkidfkqP

kgkgkk

log where

)|(1)|(

0)( if 0

1)( if )|(

)0)(1)((|

nodes...Query on nodes term

index ofimpact eaccount th into takefactors IDF

From Baeza-Yates, Modern IR

Page 60: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 60IS 240 – Spring 2007

Vector Components

)(1)(

1

1

||

1

)|(

1)|()|()()|(

)()|()|()|()(

,1)(1)(|,

jj

jiqgdgiiji

j

j

k jijiij

iji

kj

iljljiij

dqPdqP

fidff

dC

dkPdkPkqPdPdkP

dPdkPdkPkqPdqP

iji

i

i

To get the tf*idf –like ranking use…

From Baeza-Yates, Modern IR

Page 61: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 61IS 240 – Spring 2007

Combining sources

q

Query

k1 k2

q1

I

q2

and

ki kt

dj

……

From Baeza-Yates, Modern IR

or

Page 62: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 62IS 240 – Spring 2007

Combining components

)()|())|()|(1(

)()|()|()(

network... previous in theoperator or For the

1

1

jk

j

kjjj

dPdkPkqPkqP

dPdkPkIPdIP

qqI

Page 63: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 63IS 240 – Spring 2007

Belief Network

• Very similar to Inference Network model

• Developed by Ribeiro-Neto and Muntz

• Differs from Inference Networks in that it has a clearly defined “Sample Space”

Page 64: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 64IS 240 – Spring 2007

Belief Networks

q

k1k2 ki

kt

dNd2d1

Page 65: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 65IS 240 – Spring 2007

Belief Networks

• The universe of discourse U is the set K of all index terms

KKu

kkK t

ofsubset a be Let

discourse of

universe theis },...,{set The 1

Page 66: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 66IS 240 – Spring 2007

Belief Networks

)(

)()|(

qP

qdPqdP j

j

Applying Bayes Theorem

Page 67: 2007.02.15 - SLIDE 1IS 240 – Spring 2007 Lecture 10: Inference and Belief Networks Prof. Ray Larson University of California, Berkeley School of Information.

2007.02.15 - SLIDE 67IS 240 – Spring 2007

Belief Networks

ujj

t

u

uPuqdPqdP

uP

uPucPcP

)()|()|(

2

1)(

)()|()(