Daphne Koller Parameter Estimation Maximum Likelihood Estimation Probabilistic Graphical Models Learning.
Post on 21-Jan-2016
223 Views
Preview:
Transcript
Daphne Koller
Parameter Estimation
MaximumLikelihoodEstimation
ProbabilisticGraphicalModels
Learning
Daphne Koller
Biased Coin Example
• Tosses are independent of each other• Tosses are sampled from the same
distribution (identically distributed)
P is a Bernoulli distribution: P(X=1) = , P(X=0) = 1-
sampled IID from P
Daphne Koller
IID as a PGM
XData m X[1] X[M]
. . .
0
1
][1
][)|][(
xmx
xmxmxP
Daphne Koller
Maximum Likelihood Estimation
• Goal: find [0,1] that predicts D well• Prediction quality = likelihood of D given
M
mmxPDPDL
1)|][()|():(
HHTTHL ,,,,:
0 0.2 0.4 0.6 0.8 1
L(D:
)
Daphne Koller
Maximum Likelihood Estimator
• Observations: MH heads and MT tails
• Find maximizing likelihood
• Equivalent to maximizing log-likelihood
• Differentiating the log-likelihood and solving for :
TH MMTH MML )1(),:(
)1log(log),:( THTH MMMMl
TH
H
MM
M
Daphne Koller
Sufficient Statistics
• For computing in the coin toss example, we only needed MH and MT since
• MH and MT are sufficient statistics
TH MMDL )1():(
Daphne Koller
Sufficient Statistics• A function s(D) is a sufficient statistic from
instances to a vector in k if for any two datasets D and D’ and any we have
)':():(])[(])[('][][
DLDLixsixsDixDix
Datasets
Statistics
Daphne Koller
Sufficient Statistic for Multinomial
k
i
Mi
iDL1
):(
• For a dataset D over variable X with k values, the sufficient statistics are counts <M1,...,Mk> where Mi is the # of times that X[m]=xi in D
• Sufficient statistic s(x) is a tuple of dimension k– s(xi)=(0,...0,1,0,...,0)
i
Daphne Koller
Sufficient Statistic for Gaussian
• Gaussian distribution:
• Rewrite as
• Sufficient statistics for Gaussian: s(x)=<1,x,x2>
2
2
12
2
1)(),(~)(
x
eXpNXP if
2
2
222
2
1exp
2
1)(
xxXp
Daphne Koller
Maximum Likelihood Estimation
• MLE Principle: Choose to maximize L(D:)
• Multinomial MLE:
• Gaussian MLE: m
mxM
][1
m
i i
ii
M
M
1
m
mxM
2)ˆ][(1
ˆ
Daphne Koller
Summary
• Maximum likelihood estimation is a simple principle for parameter selection given D
• Likelihood function uniquely determined by sufficient statistics that summarize D
• MLE has closed form solution for many parametric distributions
Daphne Koller
END END END
top related