Top Banner
Lecture 2: Inference
85

Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

May 30, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Lecture2:Inference

Page 2: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Inference:AnUbiquitousObstacle

•  Decodingisinference.•  Subrou<nesforlearningareinference.•  Learningisinference.

•  Exactinferenceis#P‐complete.– Evenapproxima<onswithinagivenabsoluteorrela<veerrorarehard.

Page 3: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Probabilis<cInferenceProblemsGivenvaluesforsomerandomvariables(X⊂ V)…•  MostProbableExplana<on:whatarethemost probablevaluesoftherest 

ofther.v.sV\X?

(Moregenerally…)•  MaximumA Posteriori (MAP):whatarethemostprobablevaluesofsome

otherr.v.s,Y⊂ (V\X)?

•  RandomsamplingfromtheposteriorovervaluesofY•  FullposteriorovervaluesofY•  Marginalprobabili<esfromtheposterioroverY

•  MinimumBayesrisk:WhatistheYwiththelowestexpectedcost?•  Cost‐augmenteddecoding:Whatisthemostdangerous Y?

Page 4: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

ApproachestoInference

inference

exact

variableelimina<on

dynamicprogram’ng

ILP

approximate

randomized

MCMC

Gibbs

importancesampling

randomizedsearch

simulatedannealing

determinis<c

varia<onal

meanfield

loopybeliefpropaga<on

LPrelaxa<ons

dualdecomp.

localsearch

beamsearch

lecture6

today

Page 5: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

ExactMarginalforY

•  Thiswillbeageneraliza<onofalgorithmsyoualreadyknow:theforwardandbackwardalgorithms.

•  Thegeneralnameisvariableelimina<on.

•  A\erweseeitforthemarginal,we’llseehowtouseitfortheMAP.

Page 6: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

SimpleInferenceExample

•  Goal:P(D)

A

B

C

D

0

1

P(B|A) 0 1

0

1

P(C|B) 0 1

0

1

P(D|C) 0 1

0

1

Page 7: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

SimpleInferenceExample

•  Let’scalculateP(B)fromthingswehave.

A

B

C

D

0

1

P(B|A) 0 1

0

1

P(C|B) 0 1

0

1

P(D|C) 0 1

0

1

Page 8: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

SimpleInferenceExample

•  Let’scalculateP(B)fromthingswehave.

A

B

C

D

Page 9: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

SimpleInferenceExample

•  Let’scalculateP(B)fromthingswehave.

•  NotethatCandDdonotmaaer.

A

B

C

D

Page 10: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

SimpleInferenceExample

•  Let’scalculateP(B)fromthingswehave.

A

B

C

D

0

1

P(B|A) 0 1

0

1

T

= 0

1

Page 11: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

SimpleInferenceExample

•  WenowhaveaBayesiannetworkforthemarginaldistribu<onP(B,C,D).

B

C

D

0

1

P(C|B) 0 1

0

1

P(D|C) 0 1

0

1

Page 12: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

SimpleInferenceExample

•  WecanrepeatthesameprocesstocalculateP(C).

•  WealreadyhaveP(B)!

B

C

D

Page 13: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

SimpleInferenceExample

•  WecanrepeatthesameprocesstocalculateP(C).

B

C

D

0

1

P(C|B) 0 1

0

1

T

= 0

1

Page 14: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

SimpleInferenceExample

•  WenowhaveP(C,D).•  MarginalizingoutAandBhappenedintwosteps,andweareexploi<ngtheBayesiannetworkstructure.

C

D

0

1

P(D|C) 0 1

0

1

Page 15: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

SimpleInferenceExample

•  LaststeptogetP(D):

D

0

1

P(D|C) 0 1

0

1

T

= 0

1

Page 16: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

SimpleInferenceExample

•  No<cethatthesamestephappenedforeachrandomvariable:– WecreatedanewCPDoverthevariableandits“successor”

– Wesummedout(marginalized)thevariable.

Page 17: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

ThatWasVariableElimina<on

•  Wereusedcomputa<onfrompreviousstepsandavoideddoingthesameworkmorethanonce.– Dynamicprogrammingàlaforwardalgorithm!

•  WeexploitedtheBayesiannetworkstructure(eachsubexpressiononlydependsonasmallnumberofvariables).

•  Exponen<alblowupavoided!

Page 18: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

WhatRemains

•  Somemachinery•  Variableelimina<oningeneral

•  Themaximiza<onversion(forMAPinference)

•  Abitaboutapproximateinference

Page 19: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

FactorGraphs

•  Variablenodes(circles)•  Factornodes(squares)

–  CanbeMNfactorsorBNcondi<onalprobabilitydistribu<ons!

•  Edgebetweenvariableandfactorifthefactordependsonthatvariable.

•  Thegraphisbipar<te.

Z

X

Y

φ1

φ2

φ3

φ4

Page 20: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

ProductsofFactors

•  Giventwofactorswithdifferentscopes,wecancalculateanewfactorequaltotheirproducts.

Page 21: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

ProductsofFactors

•  Giventwofactorswithdifferentscopes,wecancalculateanewfactorequaltotheirproducts.

A B ϕ1(A,B)

0 0 30

0 1 5

1 0 1

1 1 10

B C ϕ2(B,C)

0 0 100

0 1 1

1 0 1

1 1 100

. =

A B C ϕ3(A,B,C)

0 0 0 3000

0 0 1 30

0 1 0 5

0 1 1 500

1 0 0 100

1 0 1 1

1 1 0 10

1 1 1 1000

Page 22: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

FactorMarginaliza<on

•  GivenXandY(Y∉X),wecanturnafactorϕ(X,Y)intoafactorψ(X)viamarginaliza<on:

Page 23: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

FactorMarginaliza<on

•  GivenXandY(Y∉X),wecanturnafactorϕ(X,Y)intoafactorψ(X)viamarginaliza<on:

P(C|A,B) 0,0 0,1 1,0 1,1

0 0.5 0.4 0.2 0.1

1 0.5 0.6 0.8 0.9

A C ψ(A,C)

0 0 0.9

0 1 0.3

1 0 1.1

1 1 1.7“summing out” B

Page 24: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

FactorMarginaliza<on

•  GivenXandY(Y∉X),wecanturnafactorϕ(X,Y)intoafactorψ(X)viamarginaliza<on:

P(C|A,B) 0,0 0,1 1,0 1,1

0 0.5 0.4 0.2 0.1

1 0.5 0.6 0.8 0.9

A B ψ(A,B)

0 0 1

0 1 1

1 0 1

1 1 1“summing out” C

Page 25: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

FactorMarginaliza<on

•  GivenXandY(Y∉X),wecanturnafactorϕ(X,Y)intoafactorψ(X)viamarginaliza<on:

•  Wecanrefertothisnewfactorby∑Yϕ.

Page 26: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

MarginalizingEverything?

•  TakeaMarkovnetwork’s“productfactor”bymul<plyingall ofitsfactors.

•  Sumoutallthevariables(onebyone).

•  Whatdoyouget?

Page 27: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

FactorsAreLikeNumbers

•  Productsarecommuta<ve:ϕ1· ϕ2=ϕ2· ϕ1•  Productsareassocia<ve:(ϕ1· ϕ2) · ϕ3=ϕ1· (ϕ2· ϕ3)

•  Sumsarecommuta<ve:∑X∑Yϕ=∑Y∑Xϕ

•  Distribu<vityofmultliplica<onoversumma<on:

Page 28: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Elimina<ngOneVariable

Input:SetoffactorsΦ,variableZtoeliminateOutput:newsetoffactorsΨ

1. LetΦ’={ϕ∈Φ|Z∈Scope(ϕ)}

2. LetΨ={ϕ∈Φ|Z∉Scope(ϕ)}

3. Letψbe∑Z∏ϕ∈Φ’ϕ

4. ReturnΨ∪{ψ}

Page 29: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Example

•  Query:P(Flu|runnynose)

•  Let’seliminateH.

Flu All.

S.I.

R.N. H.

Page 30: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Example

•  Query:P(Flu|runnynose)

•  Let’seliminateH.

Flu All.

S.I.

R.N. H.

ϕSR ϕSH

ϕAϕF

ϕFAS

Page 31: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Example

•  Query:P(Flu|runnynose)

•  Let’seliminateH.1. Φ’={ϕSH}2. Ψ={ϕF,ϕA,ϕFAS,ϕSR}3. ψ=∑H∏ϕ∈Φ’ϕ4. ReturnΨ∪{ψ}

Flu All.

S.I.

R.N. H.

ϕSR ϕSH

ϕAϕF

ϕFAS

Page 32: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Example

•  Query:P(Flu|runnynose)

•  Let’seliminateH.1. Φ’={ϕSH}2. Ψ={ϕF,ϕA,ϕFAS,ϕSR}3. ψ=∑HϕSH

4. ReturnΨ∪{ψ}

Flu All.

S.I.

R.N. H.

ϕSR ϕSH

ϕAϕF

ϕFAS

Page 33: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Example

•  Query:P(Flu|runnynose)

•  Let’seliminateH.1. Φ’={ϕSH}2. Ψ={ϕF,ϕA,ϕFAS,ϕSR}3. ψ=∑HϕSH

4. ReturnΨ∪{ψ}

Flu All.

S.I.

R.N. H.

ϕSR ϕSH

ϕAϕF

ϕFAS

P(H|S) 0 1

0 0.8 0.1

1 0.2 0.9

S ψ(S)

0 1.0

1 1.0

Page 34: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Example

•  Query:P(Flu|runnynose)

•  Let’seliminateH.1. Φ’={ϕSH}2. Ψ={ϕF,ϕA,ϕFAS,ϕSR}3. ψ=∑HϕSH

4. ReturnΨ∪{ψ}

Flu All.

S.I.

R.N.

ϕSR ψ

ϕAϕF

ϕFAS

P(H|S) 0 1

0 0.8 0.1

1 0.2 0.9

S ψ(S)

0 1.0

1 1.0

Page 35: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Example

•  Query:P(Flu|runnynose)

•  Let’seliminateH.•  Wecanactuallyignorethenewfactor,equivalentlyjustdele<ngH!– Why?–  Insomecaseselimina<ngavariableisreallyeasy!

Flu All.

S.I.

R.N.

ϕSR

ϕAϕF

ϕFAS

S ψ(S)

0 1.0

1 1.0

Page 36: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

VariableElimina<on

Input:SetoffactorsΦ,orderedlistofvariablesZtoeliminate

Output:newfactorψ

1. ForeachZi∈Z(inorder):–  LetΦ=Eliminate‐One(Φ,Zi)

2. Return∏ϕ∈Φϕ

Page 37: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Example

•  Query:P(Flu|runnynose)

•  Hisalreadyeliminated.

•  Let’snoweliminateS.

Flu All.

S.I.

R.N.

ϕSR

ϕAϕF

ϕFAS

Page 38: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Example

•  Query:P(Flu|runnynose)

•  Elimina<ngS.1. Φ’={ϕSR,ϕFAS}2. Ψ={ϕF,ϕA}3. ψFAR=∑S∏ϕ∈Φ’ϕ4. ReturnΨ∪{ψFAR}

Flu All.

S.I.

R.N.

ϕSR

ϕAϕF

ϕFAS

Page 39: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Example

•  Query:P(Flu|runnynose)

•  Elimina<ngS.1. Φ’={ϕSR,ϕFAS}2. Ψ={ϕF,ϕA}3. ψFAR=∑SϕSR∙ϕFAS4. ReturnΨ∪{ψFAR}

Flu All.

S.I.

R.N.

ϕSR

ϕAϕF

ϕFAS

Page 40: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Example

•  Query:P(Flu|runnynose)

•  Elimina<ngS.1. Φ’={ϕSR,ϕFAS}2. Ψ={ϕF,ϕA}3. ψFAR=∑SϕSR∙ϕFAS4. ReturnΨ∪{ψFAR}

Flu All.

R.N.

ϕAϕF

ψFAR

Page 41: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Example

•  Query:P(Flu|runnynose)

•  Finally,eliminateA.

Flu All.

R.N.

ϕAϕF

ψFAR

Page 42: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Example

•  Query:P(Flu|runnynose)

•  Elimina<ngA.1. Φ’={ϕA,ϕFAR}2. Ψ={ϕF}3. ψFR=∑AϕA∙ψFAR4. ReturnΨ∪{ψFR}

Flu All.

R.N.

ϕAϕF

ψFAR

Page 43: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Example

•  Query:P(Flu|runnynose)

•  Elimina<ngA.1. Φ’={ϕA,ϕFAR}2. Ψ={ϕF}3. ψFR=∑AϕA∙ψFAR4. ReturnΨ∪{ψFR}

Flu

R.N.

ϕF

ψFR

Page 44: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

MarkovChain,Again

•  Earlier,weeliminatedA,thenB,thenC.

A

B

C

D

0

1

P(B|A) 0 1

0

1

P(C|B) 0 1

0

1

P(D|C) 0 1

0

1

Page 45: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

MarkovChain,Again

•  Nowlet’sstartbyelimina<ngC.

A

B

C

D

0

1

P(B|A) 0 1

0

1

P(C|B) 0 1

0

1

P(D|C) 0 1

0

1

Page 46: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

MarkovChain,Again

•  Nowlet’sstartbyelimina<ngC.

A

B

C

D

P(C|B) 0 1

0

1

P(D|C) 0 1

0

1

. =

B C D ϕ’(B,C,D)

0 0 0

0 0 1

0 1 0

0 1 1

1 0 0

1 0 1

1 1 0

1 1 1

Page 47: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

MarkovChain,Again

•  Nowlet’sstartbyelimina<ngC.

A

B

C

D

ΣC =

B C D ϕ’(B,C,D)

0 0 0

0 0 1

0 1 0

0 1 1

1 0 0

1 0 1

1 1 0

1 1 1

B D ψ(B,D)

0 0

0 1

1 0

1 1

Page 48: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

MarkovChain,Again

•  Elimina<ngBwillbesimilarlycomplex.

A

B

D

B D ψ(B,D)

0 0

0 1

1 0

1 1

Page 49: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

VariableElimina<on:Comments

•  Canpruneawayallnon‐ancestorsofthequeryvariables.

•  Orderingmakesadifference!

•  WorksforMarkovnetworksandBayesiannetworks.– FactorsneednotbeCPDsand,ingeneral,newfactorswon’tbe.

Page 50: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

WhataboutEvidence?

•  Sofar,we’vejustconsideredtheposterior/marginalP(Y).

•  Next:condi<onaldistribu<onP(Y|X=x).

•  It’salmostthesame:theaddi<onalstepistoreducefactorstorespecttheevidence.

Page 51: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Example

•  Query:P(Flu|runnynose)

•  Let’sreducetoR=true(runnynose).

Flu All.

S.I.

R.N. H.

ϕSR ϕSH

ϕAϕF

ϕFAS

P(R|S) 0 1

0

1

Page 52: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Example

•  Query:P(Flu|runnynose)

•  Let’sreducetoR=true(runnynose).

Flu All.

S.I.

R.N. H.

ϕSR ϕSH

ϕAϕF

ϕFAS

P(R|S) 0 1

0

1

S R ϕSR(S,R)0 0

0 1

1 0

1 1

Page 53: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Example

•  Query:P(Flu|runnynose)

•  Let’sreducetoR=true(runnynose).

Flu All.

S.I.

R.N. H.

ϕSR ϕSH

ϕAϕF

ϕFAS

P(R|S) 0 1

0

1

S R ϕSR(S,R)0 0

0 1

1 0

1 1

S R ϕ’S(S)0 0

0 1

1 0

1 1

Page 54: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Example

•  Query:P(Flu|runnynose)

•  Let’sreducetoR=true(runnynose).

Flu All.

S.I.

R.N. H.

ϕSR ϕSH

ϕAϕF

ϕFAS

P(R|S) 0 1

0

1

S R ϕSR(S,R)0 0

0 1

1 0

1 1

S R ϕ’S(S)0 1

1 1

Page 55: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Example

•  Query:P(Flu|runnynose)

•  Let’sreducetoR=true(runnynose).

Flu All.

S.I.

H.

ϕ’S ϕSH

ϕAϕF

ϕFAS

S R ϕ’S(S)0 1

1 1

Page 56: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Example

•  Query:P(Flu|runnynose)

•  Nowrunvariableelimina<onallthewaydowntoonefactor(forF).

Flu All.

S.I.

H.

ϕ’S ϕSH

ϕAϕF

ϕFAS

H can be pruned for the same reasons as before.

Page 57: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Example

•  Query:P(Flu|runnynose)

•  Nowrunvariableelimina<onallthewaydowntoonefactor(forF).

Flu All.

S.I.

ϕ’S

ϕAϕF

ϕFAS

Eliminate S.

Page 58: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Example

•  Query:P(Flu|runnynose)

•  Nowrunvariableelimina<onallthewaydowntoonefactor(forF).

Flu All.ψFA

ϕAϕF

Eliminate A.

Page 59: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Example

•  Query:P(Flu|runnynose)

•  Nowrunvariableelimina<onallthewaydowntoonefactor(forF).

Flu ψF

ϕF

Take final product.

Page 60: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Example

•  Query:P(Flu|runnynose)

•  Nowrunvariableelimina<onallthewaydowntoonefactor.

ϕF· ψF

Page 61: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

VariableElimina<onforCondi<onalProbabili<es

Input:GraphicalmodelonV,setofqueryvariablesY,evidenceX=x

Output:factorϕandscalarα1.  Φ=factorsinthemodel2.  ReducefactorsinΦbyX=x3.  ChoosevariableorderingonZ=V\Y\X4.  ϕ=Variable‐Elimina<on(Φ,Z)5.  α=∑z∈Val(Z)ϕ(z)6.  Returnϕ,α

Page 62: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Note

•  ForBayesiannetworks,thefinalfactorwillbeP(Y,X=x)andthesumα=P(X=x).

•  ThisequatestoaGibbsdistribu<onwithpar<<onfunc<on=α.

Page 63: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

VariableElimina<on

•  Ingeneral,exponen<alrequirementsininducedwidthcorrespondingtotheorderingyouchoose.

•  It’sNP‐hardtofindthebestelimina<onordering.

•  Ifyoucanavoid“big”intermediatefactors,youcanmakeinferencelinearinthesizeoftheoriginalfactors.–  Chordalgraphs–  Polytrees

Page 64: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Addi<onalComments

•  Run<medependsonthesizeoftheintermediatefactors.

•  Hence,variableelimina<onorderingmaaersalot.– Butit’sNP‐hardtofindthebestone.– ForMNs,chordal graphspermitinferencein<melinearinthesizeoftheoriginalfactors.

– ForBNs,polytreestructuresdothesame.

Page 65: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Ge}ngBacktoNLP

•  Tradi<onalstructuredNLPmodelsweresome<messubconsciouslychosenfortheseproper<es.– HMMs,PCFGs(withalialework)

– Butnot:IBMmodel3

•  NeedMAPinferencefordecoding!

•  Needapproximateinferenceforcomplexmodels!

Page 66: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

FromMarginalstoMAP

•  Replacefactormarginaliza<onstepswithmaximiza;on.– Addbookkeepingtokeeptrackofthemaximizingvalues.

•  Addatracebackattheendtorecoverthesolu<on.

•  Thisisanalogoustotheconnec<onbetweentheforwardalgorithmandtheViterbialgorithm.– Orderingchallengeisthesame.

Page 67: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

FactorMaximiza<on

•  GivenXandY(Y∉X),wecanturnafactorϕ(X,Y)intoafactorψ(X)viamaximiza<on:

•  WecanrefertothisnewfactorbymaxYϕ.

Page 68: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

FactorMaximiza<on

•  GivenXandY(Y∉X),wecanturnafactorϕ(X,Y)intoafactorψ(X)viamaximiza<on:

A C ψ(A,C)

0 0 1.1 B=1

0 1 1.7 B=1

1 0 1.1 B=1

1 1 0.7 B=0“maximizing out” B

A B C ϕ(A,B,C)

0 0 0 0.9

0 0 1 0.3

0 1 0 1.1

0 1 1 1.7

1 0 0 0.4

1 0 1 0.7

1 1 0 1.1

1 1 1 0.2

Page 69: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Distribu<veProperty

•  Ausefulpropertyweexploitedinvariableelimina<on:

•  Underthesamecondi<ons,factormul<plica<ondistributesovermax,too:

Page 70: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Traceback

Input:Sequenceoffactorswithassociatedvariables:(ψZ1,…,ψZk)

Output:z*

•  EachψZisafactorwithscopeincludingZandvariableseliminateda<erZ.

•  Workbackwardsfromi=kto1:– Letzi=argmaxzψZi(z,zi+1,zi+2,…,zk)

•  Returnz

Page 71: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

AbouttheTraceback

•  Noextra(asympto<c)expense.– Lineartraversalovertheintermediatefactors.

•  Thefactoropera<onsforbothsum‐productVEandmax‐productVEcanbegeneralized.– Example:gettheKmostlikelyassignments

Page 72: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Input:SetoffactorsΦ,variableZtoeliminateOutput:newsetoffactorsΨ

1. LetΦ’={ϕ∈Φ|Z∈Scope(ϕ)}2. LetΨ={ϕ∈Φ|Z∉Scope(ϕ)}3. LetτbemaxZ∏ϕ∈Φ’ϕ

–  Letψ be ∏ϕ∈Φ’ϕ(bookkeeping) 4. ReturnΨ∪{τ},ψ

Elimina<ngOneVariable(Max‐ProductVersionwithBookkeeping)

Page 73: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

VariableElimina<on(Max‐ProductVersionwithDecoding)Input:SetoffactorsΦ,orderedlistofvariablesZtoeliminate

Output:newfactor

1. ForeachZi∈Z(inorder):–  Let(Φ,ψZi)=Eliminate‐One(Φ,Zi)

2. Return∏ϕ∈Φϕ,Traceback({ψZi})

Page 74: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

VariableElimina<onTips

•  Anyorderingwillbecorrect.•  Mostorderingswillbetooexpensive.

•  Thereareheuris<csforchoosinganordering(youarewelcometofindthemandtestthemout).

Page 75: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

(RocketScience:TrueMAP)

•  Evidence:X=x•  Query:Y•  Othervariables:Z=V\X\Y

•  First,marginalizeoutZ,thendoMAPinferenceoverYgivenX=x

•  ThisisnotusuallyaaemptedinNLP,withsomeexcep<ons.

Page 76: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

SketchofGibbsSampling

•  MCMC:design(onpaper)agraphwhereeachconfigura<onfromVal(V)isanode.– Transi<onsinthegraphdesignedtogiveaMarkovchainwhosesta<onarydistribu<onistheposterior.

•  Simulatearandomwalkinthegraph.

•  Ifyouwalklongenough,yourposi<onisdistributedaccordingtoP(V).

Page 77: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Transi<onsinGibbsSampling

•  Atransi<onintheMarkovchainequatestochangingasubsetoftherandomvariables.

•  Gibbs:resampleVi’svalueaccordingtoP(Vi|V\{Vi}).– OnlyneedthelocalfactorsthataffectVi:takeproduct,marginalize,andrandomlychoosenewvalue.

•  SimplylockevidencevariablesX.•  Maximizingversiongraduallyshi\ssamplerinfavorofmostprobablevalueforVi.

Page 78: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

SketchofMeanFieldVaria<onalInference

•  Inferencewithourdistribu<onPishard.•  Choosean“easier”distribu<onfamily,Q.Thenfind:

•  Usuallyitera<vemethodsarerequiredto“fit”QtoP.– Theseo\enresemblefamiliarlearningalgorithmslikeEM!

Page 79: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

EnergyFunc<onal

•  Expecta<onsundersimplerdistribu<onfamily,Q.–  EveryelementofQ isanapproximatesolu<on.– Wetrytofindthebestone.

Page 80: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Varia<onalMethods

•  Thisisasimpleexample.•  Foranyλandanyx:

family of functions gλ(x)

Page 81: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Varia<onalMethods

•  Thisisasimpleexample.•  Foranyλandanyx:

•  Further,foranyx,thereissomeλwheretheboundis<ght.– λiscalledavaria/onalparameter.

Page 82: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Tangent:Varia<onalMethods

•  Thisisasimpleexample.•  Foranyλandanyx:

•  Further,foranyx,thereissomeλwheretheboundis<ght.– λiscalledavaria/onalparameter.

Page 83: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Tangent:Varia<onalMethods

•  Thisisasimpleexample.•  Foranyλandanyx:

•  Further,foranyx,thereissomeλwheretheboundis<ght.– λiscalledavaria/onalparameter.

•  Forus,logP(X=x)islike‐ln(x),andQislikeλ.

Page 84: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

StructuredVaria<onalApproach

•  Maximizetheenergyfunc<onaloverafamilyQ thatiswell‐defined.– Agraphicalmodel!– ProbablynotanI‐mapforP.(Boundisn’t<ght.)

•  Simplerstructuresleadtoeasierinference.– Meanfieldisthesimplest:

Page 85: Lecture 2: Inference - Carnegie Mellon School of Computer ...nasmith/psnlp/lecture2.pdfInference inference exact variable eliminaon dynamic program’ng ILP approximate randomized

Par<ngShots

•  Youwillprobablyneverimplementthegeneralvariableelimina<onalgorithm.

•  Youwillrarelyuseexactinference.

•  Thereisvalueinunderstandingtheproblemthatapproxima<onmethodsaretryingtosolve,andwhatanexact(ifintractable)solu<onwouldlooklike!