Daphne Koller Parameter Estimation Max Likelihood for BNs Probabilistic Graphical Models Learning
Jan 15, 2016
Daphne Koller
Parameter Estimation
Max Likelihoodfor BNs
ProbabilisticGraphicalModels
Learning
Daphne Koller
MLE for Bayesian Networks• Parameters:
• Data instances: <x[m],y[m]> X
Y
Y
X y0 y1
x0 0.95 0.05
x1 0.2 0.8
X
x0 x1
0.7 0.3
Daphne Koller
MLE for Bayesian Networks• Parameters:
):][],[():(1
M
m
mymxPDL
):][|][():][(
):][|][():][(
1|
1
11M
mXY
M
mX
M
m
M
m
mxmyPmxP
mxmyPmxP
):][|][():][(1
M
m
mxmyPmxP
X
X
Y
Y|X
Data d
Daphne Koller
MLE for Bayesian Networks• Likelihood for Bayesian network
if Xi|Ui are disjoint, then MLE can be computed by
maximizing each local likelihood separately
iii
i miii
m iiii
m
DL
mmxP
mmxP
mxPDL
):(
):][|][(
):][|][(
):][():(
U
U
Daphne Koller
MLE for Table CPDs
uu
Uu
u][,][:
|,
):][|][(mxmxm
Xx
mmxP
][
],[
],'[
],[
'
| u
u
u
uu M
xM
xM
xM
x
x
):][|][():][|][(1
|1
M
mX
M
m
mmxPmmxP Uuu
uu
uu ][,][:
|, mxmxm
xx
],[
,|
u
uu
xM
xx
Daphne Koller
MLE for Linear Gaussians
Daphne Koller
Shared ParametersS(1)S(0) S(2) S(3)
S’|S
Daphne Koller
Shared Parameters
S(1)
O(1)
S(0) S(2)
O(2)
S(3)
O(3)
S’|S
O’|S’
Daphne Koller
Summary• For BN with disjoint sets of parameters in
CPDs, likelihood decomposes as product of local likelihood functions, one per variable
• For table CPDs, local likelihood further decomposes as product of likelihood for multinomials, one for each parent combination
• For networks with shared CPDs, sufficient statistics accumulate over all uses of CPD
Daphne Koller
END END END