Max Likelihood for BNs

Daphne Koller

Parameter Estimation

Max Likelihoodfor BNs

ProbabilisticGraphicalModels

Learning

Daphne Koller

MLE for Bayesian Networks• Parameters:

• Data instances: <x[m],y[m]> X

Y

Y

X y0 y1

x0 0.95 0.05

x1 0.2 0.8

X

x0 x1

0.7 0.3

Daphne Koller

MLE for Bayesian Networks• Parameters:

):][],[():(1

M

m

mymxPDL

):][|][():][(

):][|][():][(

1|

1

11M

mXY

M

mX

M

m

M

m

mxmyPmxP

mxmyPmxP

):][|][():][(1

M

m

mxmyPmxP

X

X

Y

Y|X

Data d

Daphne Koller

MLE for Bayesian Networks• Likelihood for Bayesian network

if Xi|Ui are disjoint, then MLE can be computed by

maximizing each local likelihood separately

iii

i miii

m iiii

m

DL

mmxP

mmxP

mxPDL

):(

):][|][(

):][|][(

):][():(

U

U

Daphne Koller

MLE for Table CPDs

uu

Uu

u][,][:

|,

):][|][(mxmxm

Xx

mmxP

][

],[

],'[

],[

'

| u

u

u

uu M

xM

xM

xM

x

x

):][|][():][|][(1

|1

M

mX

M

m

mmxPmmxP Uuu

uu

uu ][,][:

|, mxmxm

xx

],[

,|

u

uu

xM

xx

Daphne Koller

MLE for Linear Gaussians

Daphne Koller

Shared ParametersS(1)S(0) S(2) S(3)

S’|S

Daphne Koller

Shared Parameters

S(1)

O(1)

S(0) S(2)

O(2)

S(3)

O(3)

S’|S

O’|S’

Daphne Koller

Summary• For BN with disjoint sets of parameters in

CPDs, likelihood decomposes as product of local likelihood functions, one per variable

• For table CPDs, local likelihood further decomposes as product of likelihood for multinomials, one for each parent combination

• For networks with shared CPDs, sufficient statistics accumulate over all uses of CPD

Daphne Koller

END END END

Max Likelihood for BNs

Documents

daphne koller4mle

daphne koller2mle

table cpds daphne koller5mle

product of likelihood

bayesian networksparameters

bayesian networkslikelihood

shared cpds

variablefor table cpds