Parameter estimation, maximum likelihood and least squares techniques Jorge Andre Swieca School Campos do Jordão, January,2003 third lecture
Jan 14, 2016
Parameter estimation, maximum likelihood and least squares
techniques
Jorge Andre Swieca School
Campos do Jordão, January,2003
third lecture
References
• Statistics, A guide to the Use of Statistical Methods in the Physical Sciences, R. Barlow, J. Wiley & Sons, 1989;
• Statistical Data Analysis, G. Cowan, Oxford, 1998
• Particle Data Group (PDG) Review of Particle Physics, 2002 electronic edition.
• Data Analysis, Statistical and Computational Methods for Scientists and Engineers, S. Brandt, Third Edition, Springer, 1999
Likelihood
“A verossimilhança (…) é muita vez toda a verdade.” Conclusão de Bento – Machado de Assis
“Quem quer que a ouvisse, aceitaria tudo por verdade, tal era a nota sincera, a meiguice dos termos e a verossimilhança dos pormenores”
Quincas Borba – Machado de Assis
Parameter estimation
p.d.f. f(x): sample space all possible values of x.
Sample of size n: independent obsv.),...,,( nxxxx 21
Joint p.d.f. )()()(),...,( nnsam xfxfxfxxf 211
Central problem of statistics: from n measurements of x , infer properties of , .);(
xf ),,,( m
21
A statistic: a function of the observed . x
To estimate prop. of p.d.f. (mean, variance,…): estimador.
Estimador para : θ θEstimador consistent 0 )ˆ(lim Pn
(large sample or assimptotic limit)
Parameter estimation
),,(ˆ nxx 1 random variable distributed as );( g(sampling distribution)
ˆ);(ˆ)](ˆ[ dgxE
nn dxdxxfxfx
11 );();()(ˆ
Infinite number of similar experiments of size n
Bias ]ˆ[Eb • sample size• functional form of estimator• true properties of p.d.f.b=0 independent of n: θ is unbiased
Important to combine results of two or more experiments.
Parameter estimation
mean square error222 ])ˆ[(])ˆ[ˆ[(])ˆ[( EEEEMSE
2bVMSE ]ˆ[
Classical statistics: no unique method for building estimatorsgiven an estimator one can evaluate its properties
sample mean
From supposed from unknown pdf ),...,,( nxxxx 21
)(xf
Estimator for E[x]=µ (population mean)
one possibility:
n
iix
nx
1
1
Parameter estimation
Important property: weak law of large numbers
If V(x) exists, is a consistent estimator for µ x
n→∞, →µ in the sense of probability x
n
i
n
ii
n
ii n
xEn
xn
ExE111
111 ][][
nnii dxdxxfxfxxE 11 )()(][
x is an unbiased estimator for the population mean µ
Parameter estimation
Sample variance )()( 22
1
22
11
1xx
n
nxx
ns
n
ii
22 ][sE
s2 is an unbiased estimator for V[x]
if µ is known 22
1
22 1
xxn
Sn
ii )(
S2 is an unbiased estimator for σ2.
Maximum likelihood
Technique for estimating parameters given a finite sample of data ),...,,( nxxxx 21
Suppose the functional form of f(x;θ) known.
The probability of be in is 1x ],[ 111 dxxx 11 dxxf );(
prob. xi in for all i = ],[ iii dxxx
n
iii dxxf
1
);(
If parameters correct: high probability for the data.
n
iixfL
1
);()( likelihood function• joint probability• θ variables• X parameters
ML estimators for θ: maximize the likelihood function
0
i
L
mi ,,1 )ˆ,,ˆ(ˆ
m 1
Maximum likelihood
Maximum likelihood
n decay times for unstable particles t1,…,tn
hypothesis: distribution an exponential p.d.f. with mean
)exp();(
ttf
1 )(log);(log)(log
in
i
n
ii
ttfL
1 1
1
0
)(logL
n
iitn 1
1
nnnn dtdtttfttttE 11joint11 );,,(),,(ˆ)],,(ˆ[
n
ttn
ii dtdteet
n
n
11
111 1
)(
n
i
n
i ijj
t
i
t
i ndtedtet
n
ji
11
1111
Maximum likelihood
01.
0621.ˆ
50 decaytimes
Maximum likelihood
1 ?
given )(a 0
a
a
LL0
a
)(ˆ aa
n
iit
n
1
1
ˆˆ
1
1
1
n
n
n
nE
]ˆ[
unbiased estimator for when n→∞
1
Maximum likelihood
n measurements of x assumed to come from a gaussian
2
2
2
2
22
1
)(
exp),;(x
xf
n
i
in
ii
xxfL
12
2
22
1
2
2
1
2
1
2
1
)(
loglog),;(log),(log
0
Llog
n
iix
n 1
1 ]ˆ[E unbiased
02
Llog
n
iix
n 1
22 1)ˆ(
/\
22 1n
nE
/\
][
unbiased for large n
Maximum likelihood
we showed that s2 is an unbiased estimator for the variance of any p.d.f., so
n
iix
ns
1
22
1
1)ˆ(
is unbiased estimator for 2
Maximum likelihood
Variance of ML estimators
many experiments (same n): spread of ? analytically (exponential)
n
iitn 1
122 ])ˆ[(]ˆ[]ˆ[ EEV
n
ttn
ii dtdteet
n
n
12
1
111 1
)(
n
ttn
ii dtdteet
n
n
11
111 1
)(
n
2
transf. invariance of ML estimators
ML estimate of n
22 ˆ
n
22
ˆ/\
ˆ n
ˆ
ˆ ˆ
Maximum likelihood
430827 ..ˆ If the experiment repeated many times (with the same n) the standard deviation of the estimation 0.43.
• one possible interpretation• not the standard when the distribution is not gaussian (68% confidence interval, +- standard deviation if the p.d.f. for the estimator is gaussian)• in the large sample limit, ML estimates are distributed according to a gaussian p.d.f.• two procedures lead to the same result
Maximum likelihood
Variance: MC method
cases too difficult to solve analytically: MC method
• simulate a large number of experiments• compute the ML estimate each time• distribution of the resulting values
S2 unbiased estimator for the variance of a p.d.f.
S from MC experiments: statistical errors of the parameter estimated from real measurement
asymptotic normality: general property of ML estimators for large samples.
Maximum likelihood
1000 experiments50 obs/experiment
s = 0.151sample standarddeviation
50
0621.ˆˆ ˆ
n
1500.
RCF bound
A way to estimate the variance of any estimators without analytical calculations or MC.
2
22
1
LE
bV
log]ˆ[ Rao-Cramer-Frechet
Equality (minimum variance): estimator efficient If efficient estimators exist for a problem, the ML will find them.
ML estimators: always efficient in the large sample limit.
Ex: exponential
t
etf
1
);(
ˆlog
2112
12
122
2 nt
n
nL n
ii
0b
nnE
V2
2
21
1
ˆ
][ equal to exact resultefficient estimator
RCF bound
),,( m
1 assume efficiency and zero bias]ˆ,ˆcov[ jiijV
jiij
LEV
log2
1
n
lll
n
kk
ji
dxxfxf11
2
);();(log
dxxfxfnji
);(log);(
2
nV
1 statistical errors
n
1
RCF bound
large data sample: evaluating the second derivative with the measured data and the ML estimates
ji
VL
ij
log/\2
1
2
22 1
Llog
/\
usual method for estimating the covariance matrix when the likelihood function is maximized numerically
Ex: MINUIT (Cern Library) • finite differences• invert the matrix to get Vij
Graphical method
single parameter θ
...)ˆ(log
!)ˆ(
log)(log)(log
ˆˆ
22
2
2
1
LLLL
2
2
2 ˆ
)ˆ(log)(log max
LL
2
1 maxlog)ˆ(log LL
logLmax
2
1
later ]ˆ,ˆ[ 68.3% central confidenceinterval
ML with two parameters
angular distribution for the scattering angles θ (x=cosθ) in a particle reaction.
32
2
1 2
xxxf ),;( normalized -1≤ x ≤+1
realistic measurements only in xmin ≤ x ≤ xmax
)()()(),;(
minmaxminmaxminmax3322
2
32
1
xxxxxx
xxxf
ML with two parameters
50.
50.
05205080 ..ˆ
10804660 ..ˆ
2000 events
ML with two parameters
500 exper.
2000 evts/exp.
Both marginalpdf’s are aprox.gaussian
4990.ˆ
0510.ˆ s
4980.ˆ
1110.ˆ s
420.r
Least squares
measured value y: gaussian random variable centered about the quantity’s true value λ(x,θ) niyx iii ,,),( 1
2
2
12
22111 22
1
i
iin
i i
nnn
yyyL
)(
exp),,,,,;,,(
estimate the
n
i i
ii xyL
12
2
2
1
));((
)(log
maximized with that mimize
n
i i
ii xy
12
22
)),((
)(
Least squares
used to define the procedure even if yi are not gaussian
n
jijjijii xyVxyL
1
1
2
1
,
));(()))(,(()(log
measurements not independent, described by a n-dim Gaussian p.d.f. with nown cov. matrix but unknown mean values:
n
jijjijii xyVxy
1
12
,
));(()))(,(()(
m ˆ,,ˆ 1LS estimators
Least squares
m
jjj xax
1
)();( )(xa j linearly independent
• estimators and their variances can be found analytically• estimators: zero bias and minimum variance
m
jjij
m
jjiji Axax
11
)();(
)()()()(
AyVAyyVy TT 112
minimum 02 112 )(
AVAyVA TT
yByVAAVA TT 111 )(
Least squares
covariance matrix for the estimators
],cov[ jiijU 11 )( AVABVBU TT
ˆ
)(
ji
ijU2
1
2
1
coincides with the RCF bound for the inverse covariance matrix if yi are gaussian distributed
2
2Llog
Least squares
);(
x linear in , quadratic in 2
)ˆ)(ˆ()ˆ()(ˆ,
jjii
m
ji ji
1
2222
2
1
to interpret this, one single θ
222
22
2
1)ˆ()ˆ()(
ˆ
/\
2
ˆ
12 )ˆ()( min
Chi-squared distribution
)();(
2
1
2
22
2 nn
zn
eznzf
n=1,2,…0 ≤ z ≤ ∞
(degrees of freedom) )!()( 1 nn)()( xxx
0
ndznzfzE );(][
ndznzfnzzV 20
2
);()(][
n independent gaussian random variables xi with known 2ii ,
n
i i
iixz
12
2
)( is distributed as a for n dof
2
Chi-squared distribution
Chi-squared distribution
Chi-squared distribution