Maximum Likelihood Estimation - unipveconomia.unipv.it/pagp/pagine_personali/erossi/macroeconometria_4... · Maximum Likelihood Estimation Eduardo Rossi University of Pavia. Likelihood
Post on 30-Aug-2018
232 Views
Preview:
Transcript
University of Pavia
2007
Maximum Likelihood Estimation
Eduardo Rossi
University of Pavia
Likelihood function
Choosing parameter values that make what one has observed more
likely to occur than any other parameter values do.
Distribution: The pair {U, V } is a random variable and the N
variables
{(U1, V1), . . . , (UN , VN )}
are i.i.d. random sample of (U, V ).
FU |V (u|v; θ0) is completely known but θ0 (true value of the
real-valued parameter vector) is unknown, θ ∈ RK .
Support of FU |V is S(θ0)
∫
S(θ0)
dFU |V (u|v; θ0) = 1 =
∑u∈S(θ0)
f(u|v; θ0) if U discrete∫
S(θ0)f(u|v; θ0)du if U continuous
Eduardo Rossi c© - Macroeconometria 07 2
Likelihood function
Probability function for (U1, . . . , UN)|(V1, . . . , VN )
N∏
t=1
f(ut|vt; θ0)
Normal Linear Regression: yt = x′tβ0 + ǫt, (yt,xt) i.i.d. normal
ut = yt, vt = xt
f(ut|vt; θ0) =1√2πσ2
0
exp
[− (yt − x′
tβ0)
2σ20
]
S(θ0) = R. Since the obs are i.i.d. normal. The conditional p.d.f. of
the sample is
N∏
t=1
f(ut|vt; θ0) =[2πσ2
0
]−N/2exp
[− (y − Xβ0)
′(y − Xβ0)
2σ20
]
Eduardo Rossi c© - Macroeconometria 07 3
Likelihood function
The marginal distribution of xt does not depend on θ0.
Student’s t Linear Regression
yt − x′tβ0 |xt
σ0∼ tν0
f(ut|vt; θ0) =Γ[(ν0 + 1)/2]
Γ(ν0/2)
1√πν0σ2
0
[1 +
(yt − x′tβ0)
2
ν0σ20
]−(ν0+1)/2
Eduardo Rossi c© - Macroeconometria 07 4
Likelihood function
Laplace Linear Regression
f(ut|vt; σ20) =
1√2σ2
0
exp−√
2|yt − x′
tβ0|σ0
U = yt, V = xt, S(θ0) = R, θ0 = [β′0, σ
20 ]
′.
We can obtain
h(θ0) ≡ E[g(u)] =
∫g(u)dF (u; θ0)
h(v; θ0) ≡ E[g(U, V )|V = v] =
∫g(u, v)dF (u|v; θ0)
Eduardo Rossi c© - Macroeconometria 07 5
The likelihood function
Unconditional specification: f(u; θ) describes the likely values of
every r.v. Ut, t = 1, 2 . . . , N for a specific value of θ0.
The sample likelihood function treats the u argument as given and θ0
as variable.
It describes the likely values of the unknown θ0 given the realizations
of the r.v. U .
The likelihood function of θ for a random variable U with p.f.
F (u; θ0) is defined to be
l(θ; U) = f(u; θ)
L(θ; U) = log l(θ; U)
Eduardo Rossi c© - Macroeconometria 07 6
The Likelihood function
Likelihood function: we evaluate the p.f. at a random variable and
consider the result as a function of the variable θ:
L(θ; U1, . . . , UN ) = log
[N∏
t=1
f(Ut; θ)
]
=
N∑
t=1
L(θ; Ut)
The conditional likelihood function of θ for a r.v. U with p.f.
f(u|v; θ0) given the r.v. V is
l(θ, U |V ) = f(u|v; θ)
L(θ; U |V ) = log l(θ; U |V )
θ0 ∈ Θ, Θ parameter space, the set of permitted values of the model.
Eduardo Rossi c© - Macroeconometria 07 7
Assumptions
Assumption (Dominance condition)
E
[supθ∈Θ
|L(θ; U |V )|]
exists.
This means that |L(θ; U |V )| is dominated by
h(U, V ) ≡ supθ∈Θ
|L(θ; U |V )|
where h(U, V ) does not depend on θ. The existence of E[h(U)]
implies the existence of E[L(θ; U |V )], θ ∈ Θ.
Lemma. If L(θ; U |V ) is the conditional log-likelihood for θ, the
Dominance condition holds, then
E [L(θ; U |V )|V ] ≤ E[L(θ0; U |V )|V ].
Eduardo Rossi c© - Macroeconometria 07 8
Proof
E
[log
(fW (U)
fU (U)
)]= E [h(Z)] ≤ h [E(Z)] ≤ log (1) = 0
Unconditional case:
E[L(θ0; U)] ≥ E[L(θ; U)]
The specification of p.f. of U determines expected values of functions
of U .
Therefore
Q(θ, θ0) ≡ E[L(θ; U)]
which depends on θ because the L does and depends on θ0 because
Q is the expected value of a function of U . The expected
loglikelihood inequality states that
Q(θ0, θ0) = maxθ∈Θ
Q(θ, θ0)
Eduardo Rossi c© - Macroeconometria 07 9
Normal linear regression model
yt|xt ∼ N(x′tβ0, σ
20)
E [L(θ, yt|xt)|xt] = − 1
2log (2πσ2) − E[(yt − x′
tβ)2|xt]
2σ2
= − 1
2log (2πσ2)+
− 1
2
E[(yt − x′tβ0 + x′
tβ0 − x′tβ)2|xt]
σ2
= − 1
2
[log (2πσ2) +
σ20 + (x′
tβ − x′tβ0)
2
σ2
]
which is uniquely maximized at x′tβ = x′
tβ0 and σ2 = σ20 .
Eduardo Rossi c© - Macroeconometria 07 10
Normal linear regression model
The conditional expectation of the conditional log-likelihood of the
entire sample is the sum of such terms
E [L(θ;y|X)|X] = −N
2log (2πσ2)− Nσ2
0 + (β − β0)′X′X(β − β0)
2σ2
which is uniquely maximized at β = β0, Xβ = Xβ0 and σ2 = σ20 if
X is full-column rank.
Eduardo Rossi c© - Macroeconometria 07 11
Student t Linear Regression
The expected log-likelihood is analytically intractable. We show that
E[L(θ; U |V )] exists, for ν0 > 2, because the concavity of the
logarithmic function
log (1 + z2) ≤ z2
E
[log
[1 +
(yt − x′tβ)2
νσ2
]∣∣∣∣xt
]≤ E
[(yt − x′
tβ)2
νσ2
∣∣∣∣xt
]
=ν0σ
20 + (x′
tβ0 − x′tβ)2
νσ2(ν0 − 2)
provided that E[xtx′t] exists, the expected log-lik exists.
Eduardo Rossi c© - Macroeconometria 07 12
Unconditional inequality
The expected log-likelihood inequality implies the unconditional
inequality
E[L(θ; U |V )] ≤ E[L(θ0; U |V )]
starting from
E[L(θ; U |V )|V ] ≤ E[L(θ0; U |V )|V ]
we can take the E[·] over V
E[L(θ; U |V )] = E [E[L(θ; U |V )|V ]]
≤ E[E[L(θ0; U |V )|V ]]
= E[L(θ0; U |V )]
Eduardo Rossi c© - Macroeconometria 07 13
The ML estimator
Because θ0 maximizes E[L(θ; U |V )] it is natural to construct an
estimator of θ0 from the value of θ that maximizes the sample: the
average log-likelihood functions of the N observations
1
N
∑
t
L(θ; Ut|Vt) ≡ EN [L(θ; U |V )]
E[L(θ; U |V )] =
∫L(θ; u|v)dF (u|v; θ0)
ML estimator: the MLE is a value of the parameter vector that
maximizes the sample average log-lik function
θN ≡ arg maxθ∈Θ
EN [L(θ)]
Eduardo Rossi c© - Macroeconometria 07 14
Normal Linear Regression Model
The empirical expectation of the log-likelihood
EN [L(θ)] = −1
2log (2πσ2) − EN [(yt − x′
tβ)2]
2σ2
= −1
2log (2πσ2) − (y − Xβ)′(y − Xβ)/N
2σ2
The log-lik is differentiable. F.O.C’s:
EN [Lβ(θ)] =1
σ2EN [xt(yt − x′
tβ)]
=1
Nσ2[X′(y − Xβ)]
EN [Lσ2(θ)] = − 1
2σ4{σ2 − EN [(yt − x′
tβ)2]}
= − 1
2σ4
[σ2 − 1
N(y − Xβ)′(y − Xβ)
]
Eduardo Rossi c© - Macroeconometria 07 15
Normal Linear Regression Model
Solutions:
1
Nσ2[X′(y − Xβ)] = 0
β = (X′X)−1X′y
σ2 =1
N(y − Xβ)′(y − Xβ)
The Hessian matrix:
EN [Lθθ(θ)] =
− 1
σ2N X′X −X′(y−Xβ)σ4N
− (y−Xβ)′Xσ4N
12σ4 − 1
σ6N (y − Xβ)′(y − Xβ)
Eduardo Rossi c© - Macroeconometria 07 16
Normal Linear Regression Model
EN [Lθθ(θ)] =
− 1bσ2N X′X −X′(y−Xbβ)bσ4N
− (y−Xbβ)′Xbσ4N1
2bσ4 − 1bσ6N (y − Xβ)′(y − Xβ)
=
− 1bσ2N X′X 0
0′ 12bσ4 − 1bσ6N (y − Xβ)′(y − Xβ)
which is negative definite.
The second-order necessary condition for a point to be the local
maximum of a twice continuously differentiable function is that the
Hessian be negative semidefinite at the point.
Eduardo Rossi c© - Macroeconometria 07 17
Normal Linear Regression Model
The MLE of σ2 is
σ2 =ǫ′ǫ
N=
N − K
Ns2
Eduardo Rossi c© - Macroeconometria 07 18
Identification
Is the DGP sufficiently informative about the parameters of the
model? If
f(u|v; θ0) = f(u|v; θ1)
data drawn from these two distributions will have the same sampling
properties. There is no way to distinguish whether θ = θ0 or θ = θ1.
Eduardo Rossi c© - Macroeconometria 07 19
Global Identification
The parameter θ0 is globally identified in Θ if, for every θ1 ∈ Θ,
θ0 6= θ1, implies that
Pr{f(U |V ; θ0) 6= f(U |V ; θ1)} > 0
Assumption (Global identification): Every parameter vector θ0 ∈ Θ
is globally identified.
Lemma (Strict expected log-likelihood inequality): Under the
Distribution, Dominance and Global identification assumptions:
θ 6= θ0
implies
E[L(θ)] < E[L(θ0)].
Eduardo Rossi c© - Macroeconometria 07 20
Example
Exact multicollinearity among explanatory variables in a linear
regression E[y|X] = Xβ0 is a failure of global identification.
If rank(X) < K then
E[L(θ)] ≤ E[L(θ0)]
still holds. The normal log-likelihood still attains its maximum in β
at β0 because
−(β − β0)′X′X(β − β0) ≤ 0
but inequality is not strict for all β 6= β0.
If rank(X) = K then β0 is the unique maximum of E[L(θ)].
Eduardo Rossi c© - Macroeconometria 07 21
Example
Identification concerns E[L(θ)] and not the EN [L(θ)].
One can discover failures of identification in the sample log-likelihood.
But if a sample log-likelihood function fails to have a unique global
maximum this does not always imply a failure of global identification.
Eduardo Rossi c© - Macroeconometria 07 22
Example
Exact multicollinearity among explanatory variables in a LRM
E[y|X] = Xβ0
is a failure of global identification. Note that if
rank(X) < K
the expected log-likelihood inequality
E[L(θ)] ≤ E[L(θ0)]
still holds.
Eduardo Rossi c© - Macroeconometria 07 23
Differentiability
When the support of the distribution depends on the unknown
parameter values the MLE cannot be found with simple calculus.
In such cases the log-lik cannot be differentiable everywhere in the
parameter space.
Assumption (Differentiability): The p.f. f(u|v; θ) is twice
continuously differentiable in θ, ∀θ ∈ Θ. The S(θ) does not depend
on θ, and differentiation and integration are interchangeable in the
sense that
∂
∂θ
∫
S(θ)
dF (u|v; θ) =
∫
S(θ)
∂
∂θdF (u|v; θ)
∂2
∂θ2
∫
S(θ)
dF (u|v; θ) =
∫
S(θ)
∂2
∂θ2 dF (u|v; θ)
Eduardo Rossi c© - Macroeconometria 07 24
Differentiability
∂E[L(θ)|V = v]
∂θ= E
[∂L(θ)
∂θ
∣∣∣∣V = v
]
∂2E[L(θ)|V = v]
∂θ∂θ′ = E
[∂2L(θ)
∂θ∂θ′
∣∣∣∣V = v
]
The interchange of differentiation and integration is ensured in part
by S(θ) = S.
θ0 = arg maxθ∈Θ
E[L(θ)]
translates into the conditions
∂E[L(θ)]
∂θ
∣∣∣∣θ=θ0
= 0
and the second order conditions that the Hessian matrix
∂2E[L(θ)]
∂θ∂θ′
∣∣∣∣θ=θ0
is a n.d. matrix.
Eduardo Rossi c© - Macroeconometria 07 25
The score function
The MLE θ is an implicit function of the data u
θ = arg maxθ∈Θ
EN [L(θ)] ∈ arg zeroθ∈ΘEN [Lθ(θ)]
The F.O.C. Normal equations or likelihood equations
EN [Lθ(θ)] = 0
where the score function
Lθ ≡ ∂L(θ)
∂θ
θ must be calculated by numerical methods for maximizing
differentiable functions.
Eduardo Rossi c© - Macroeconometria 07 26
Score Identity
Lemma (Score identity): Under Distribution and Differentiability
assumptions
E[Lθ(θ0)|V = v] = 0
Proof : Continuous random variables case
1 =
∫
S
dF (u|v; θ) =
∫
S
f(u|v; θ)du
Eduardo Rossi c© - Macroeconometria 07 27
Score Identity
we can differentiate both sides of this equality w.r.t. θ
0 =
∫
S
∂
∂θf(u|v; θ)du
=
∫
S
fθ(u|v; θ)du
=
∫
S
1
f(u|v; θ)fθ(u|v; θ)f(u|v; θ)du
consider
Lθ(θ; U |V ) =1
f(u|v; θ)fθ(u|v; θ)
E[Lθ(θ; U |V )|V = v] =
∫
S
1
f(u|v; θ)fθ(u|v; θ)f(u|v; θ0)du
Eduardo Rossi c© - Macroeconometria 07 28
Score Identity
The E[·|V = v] is evaluated at θ = θ0. For θ 6= θ0
E[Lθ(θ; U |V )|V = v] 6= 0
But if θ = θ0 then
E[Lθ(θ0; U |V )|V = v] =
∫
S
1
f(u|v; θ0)fθ(u|v; θ0)f(u|v; θ0)du = 0.
Eduardo Rossi c© - Macroeconometria 07 29
Score Identity
In the Normal Linear Regression Model
E[Lβ(θ)] =1
σ2E[xtx
′t](β0 − β)
E[Lσ2(θ)] = − 1
2σ4
(σ2 −
{σ2
0 + E[(x′tβ0 − x′
tβ)2]})
θ0 = (β0, σ20)
′
E[Lβ(θ0)] =1
σ20
E[xtx′t](β0 − β0) = 0
E[Lσ2(θ0)] = − 1
2σ40
(σ2
0 −{σ2
0 + E[(x′tβ0 − x′
tβ0)2]
})= 0
Eduardo Rossi c© - Macroeconometria 07 30
The Information Matrix
If there exists θ such that
EN [Lθ(θN )] = 0
we must check that we have a global maximum. Otherwise our
solution cannot be the MLE (θN ). A sufficient condition for θN to
be a local maximum is that the Hessian matrix
EN [Lθθ(θN )] ≡ ∂2EN [L(θ)]
∂θ∂θ′
∣∣∣∣θ=eθN
evaluated at θN is negative definite: ∀c ∈ RK , c 6= 0
c′EN [Lθθ(θN )]c < 0
it guarantees that EN [L(θ)] is strictly concave in a neighborhood of
θ.
Eduardo Rossi c© - Macroeconometria 07 31
Information Matrix
We investigate the second-order conditions for E[Lθ(θ)].
Assumption (Finite Information): V ar[Lθ(θ0)] exists.
Lemma (Information Identity): Under Distribution, Differentiability,
Finite Information assumptions
E[Lθθ(θ0)|V = v] = −V ar[Lθ(θ0)|V = v]
and this matrix is negative semidefinite.
Eduardo Rossi c© - Macroeconometria 07 32
Information Matrix
Proof :
0 =
∫
S
Lθ(θ; u|v)f(u|v; θ)du
Differentiating both sides
∂(Lθ(θ)f(θ))
∂θ′ =∂Lθ
∂θ′ f + Lθ
∂f
∂θ′
= Lθθf + Lθ(fθ)′
= (Lθθ + LθL′θ)f
f ≡ f(u|v; θ).
0 =
∫
S
[Lθθ(θ; u|v) + Lθ(θ; u|v)Lθ(θ; u|v)′]dF (u|v; θ)
Eduardo Rossi c© - Macroeconometria 07 33
Information Matrix
∫
S
Lθθ(θ; u|v)dF (u|v; θ) = −∫
S
[Lθ(θ; u|v)Lθ(θ; u|v)′]dF (u|v; θ)
Setting θ = θ0
E[Lθθ(θ0; U |V )|V = v] = −E[Lθ(θ0; U |V )Lθ(θ0; U |V )′|V = v]
= −V ar[Lθ(θ0; U |V )|V = v]
because E[Lθ(θ0; U |V )|V ] = 0. The Hessian is negative semidefinite
since is the negative of a variance matrix.
Eduardo Rossi c© - Macroeconometria 07 34
Conditional Information
The conditional variance matrix of the score vector Lθ(θ; U |V ) given
V = v and evaluated at θ0
I(θ0|v) ≡ E[Lθ(θ0)Lθ(θ0)′|V = v] = V ar[Lθ(θ0)|V = v]
we can always find the conditional information matrix function
I(θ|v) ≡∫
S
Lθ(θ; u|v)Lθ(θ; u|v)′dF (u|v; θ)
Eduardo Rossi c© - Macroeconometria 07 35
Population Information
The marginal expectation
I(θ0) ≡ E[Lθ(θ; U |V )Lθ(θ; U |V )′]
is the population information matrix.
The population information matrix is the unconditional variance
matrix of the conditional score vector because
E[Lθ(θ0; U |V )|V ] = 0
V ar[Lθ(θ0; U |V )] = E[V ar[Lθ(θ0; U |V )]] + V ar[E[Lθ(θ0; U |V )]|V ]
= E[I(θ0|V )] = I(θ0)
Eduardo Rossi c© - Macroeconometria 07 36
Normal linear regression model
The conditional information matrix for the normal linear regression
model:
I(θ0|xt) =
1σ2
0
xtx′t 0
0 12σ4
0
The Hessian of the conditional normal regression log-likelihood
function
Lθθ(θ; yt|xt) =
− 1
σ2 xtx′t − 1
σ4 xt(yt − x′tβ)
− 1σ4 (yt − x′
tβ)x′t
12σ4
0
− (yt − x′tβ)2/σ6
−E[Lθθ(θ0; yt|xt)|V ] = I(θ0|xt)
Eduardo Rossi c© - Macroeconometria 07 37
Nonsigular information
It is possible that information matrix can be singular even θ0 is
globally identifiable and the expected log-lik is uniquely maximized
at θ0.
The second order condition that the Hessian be negative
definite is sufficient but not necessary for a local maximum.
We assume this condition explicitly.
Assumption (Nonsingular Information) The information matrix
I(θ0) is nonsingular for all possible θ0 ∈ Θ.
Eduardo Rossi c© - Macroeconometria 07 38
The Cramer - Rao Lower Bound
Information matrix: measure of how much we can learn about θ0
from the random sample {(U1, V1), . . . , (UN , VN )}.Theorem: θ unbiased estimator of θ0, with finite variance matrix
with interchangeability between differentiation and integration
∂E[θ|v1, . . . , vN ]
∂θ0=
∂
∂θ0
∫
S
θ
N∏
t=1
dF (ut|vt; θ0)
=
∫
S
θ∂
∂θ0
N∏
t=1
dF (ut|vt; θ0)
if Distribution, Differentiability, Finite Information Nonsingularity
assumptions also hold then that for any a ∈ RK
a′V ar[θ|v]a ≥ a′ (NE[I(θ0)|v])−1a.
Eduardo Rossi c© - Macroeconometria 07 39
The Cramer - Rao Lower Bound
Unbiased estimator:
E[θ|v] =
∫
S
θ
N∏
t=1
dF (ut|vt; θ0)
differentiate w.r.t. θ0
IK =
∫
S
θ
[N∑
t=1
Lθ(θ0; ut|vt)′
]N∏
t=1
dF (ut|vt; θ0)
= N
∫
S
θEN [Lθ(θ0)]′
N∏
t=1
dF (ut|vt; θ0)
= NE[θEN [Lθ(θ0)]′|v]
= NCov[θ, EN [Lθ(θ0)]|v]
Eduardo Rossi c© - Macroeconometria 07 40
The Cramer - Rao Lower Bound
The covariance matrix of the vector (θ′, EN [Lθ(θ0)]
′)
Ψ = E
θ − θ0
EN [Lθ(θ0)]
((θ − θ0)
′ EN [Lθ(θ0)]′
)∣∣v
=
V ar[θ|v] N−1IK
N−1IK N−1EN [I(θ0|v)]
Ψ is a p.s.d covariance matrix. It follows that for each a ∈ RK
a′(Ψ)a ≥ 0
take
a′ =[a′,−a′EN [I(θ0|v)]−1
]
it follows that
a′V ar[θ|v]a ≥ a′N−1EN [I(θ0|v)]−1a = a′{NEN [I(θ0|v)]}−1a
Eduardo Rossi c© - Macroeconometria 07 41
The Cramer - Rao Lower Bound
In some cases we can find estimators with variances equal to the
Cramer-Rao lower bound.
The OLS estimator β is efficient relative to all unbiased estimators of
β0.
Proof : Using
I(θ0|xt) =
1σ2
0
xtx′t 0
0 12σ4
0
(N · EN [I(θ0|xt)])−1
=
1σ2
0
(X′X) 0
0 N2σ4
0
−1
=
σ2
0(X′X) 0
02σ4
0
N
because
V ar[β|X] = σ20(X′X)−1
The OLS/MLE estimator attains the Cramer-Rao lower bound.
Eduardo Rossi c© - Macroeconometria 07 42
MLE Asymptotics
The MLE is an implicit function of the random sample. MLE is not a
function of sample averages of the data.
But the sample log-likelihood is a sum of i.i.d. random variables.
Because the (Ut, Vt) ∼ i.i.d. so are any such transformations as the
L(θ) ≡ L(θ; Ut|Vt), t = 1, 2, . . . , N . The LLN can apply to the sample
average log-likelihood function itself
EN [L(θ)]p→ E[L(θ)]
for any fixed θ.
Eduardo Rossi c© - Macroeconometria 07 43
Consistency
Under the assumptions
1. Distribution
2. Dominance
3. Global Identification
4. Compactness of Θ
The MLE is consistent
θNp→ θ0
Eduardo Rossi c© - Macroeconometria 07 44
Consistency
• The sample average log-likelihood converges to the expected
log-likelihood for any value of θ:
EN [L(θ)]p→ E[L(θ)]
•
θN = arg maxθ∈Θ
EN [L(θ)] by construction
θ0 = arg maxθ∈Θ
E[L(θ)] by strict log-likelihood inequality
As a result, θNp→ θ0, provided that the relationships are
continuous.
Eduardo Rossi c© - Macroeconometria 07 45
Consistency
The argument of arg maxθ∈Θ is a function of θ, EN [L(θ)].
arg maxθ∈Θ must be a continuous function of its functional argument.
The distance between two functions over a set containing an infinite
number of possible comparisons at different values of θ: Uniform
Convergence in Probability: The sequence of real-valued
functions {gN (θ)} converges in probability to the limit function
{g0(θ)} if
supθ∈Θ
|gN (θ) − g0(θ)| p−→ 0
we say gN (θ)p−→ g0(θ) uniformly.
Eduardo Rossi c© - Macroeconometria 07 46
Consistency
We use the Uniform Convergence in Probability in order to define the
probability limit of a sequence of random functions.
Uniform LLN. g(θ, U) continuous function over θ ∈ Θ, where
Θ ⊂ RK is closed and bounded, {Ut} is a sequence of i.i.d. r.v. with
c.d.f. FU (u). If E[supθ∈Θ||g(θ; U)||] exists, then
1. E[g(θ; U)] is continuous over θ ∈ Θ
2. EN [g(θ; U)]p→ E[g(θ; U)]
Eduardo Rossi c© - Macroeconometria 07 47
Consistency
We apply the uniform LLN to the sample average log-likelihood.
Consistency of Maxima. If there is a sequence of functions QN (θ)
that converges in probability uniformly to a function Q0(θ) on the
closed and bounded Θ and if Q0(θ) is continuous and uniquely
maximized at θ0, then
θN = arg maxθ∈Θ
QN (θ)p→ θ0
Compactness and differentiability guarantee that EN [L(θ)] has a
maximum.
Eduardo Rossi c© - Macroeconometria 07 48
Consistency
Let
g(θ; U) ≡ L(θ; U |V )
the conditional likelihood function for θ evaluated at the r.v. (U, V ).
The conditions for uniform convergence are satisfied:
• Differentiability implies continuity of L(θ)
• Compactness of Θ.
• (Ut, Vt) are i.i.d. with c.d.f. FU |V (u|v; θ)
• Dominance states that E[supθ∈Θ |L(θ)|] exists
Then E[L(θ)] is continuous and
EN [L(θ)]p→ E[L(θ)]
uniformly.
Eduardo Rossi c© - Macroeconometria 07 49
Consistency
For the Consistency of Maxima
QN (θ) = EN [L(θ)] andQ0(θ) = E[L(θ)].
Under the assumptions:
• From Likelihood Identification: if ∀θ1 ∈ Θ, θ0 6= θ1 implies
Pr{L(θ0) 6= L(θ1)} > 0
• we have the Strict Expected Log-likelihood Inequality : θ 6= θ0
implies
E[L(θ)] < E[L(θ0)]
Hence E[L(θ)] is uniquely maximized at θ0. Therefore
θN = arg maxθ∈Θ
EN [L(θ)]p→ θ0 = arg max E[L(θ)]
Eduardo Rossi c© - Macroeconometria 07 50
Asymptotic Normality
Assumption: There is an open subset of Θ that contains the
population parameter value θ0.
θ0 is not on the boundary of Θ.
Assumption:
EN [Lθ(θN )] = 0
the MLE solves the normal equations.
First-order Taylor series expansion:
EN [Lθ(θN )] = 0 = EN [Lθ(θ0)] + EN [Lθθ(θN )](θN − θ0)
θN = αN θN + (1 − αN )θ0 αN ∈ [0, 1]
Eduardo Rossi c© - Macroeconometria 07 51
Asymptotic Normality
√N(θN − θ0) = {−EN [Lθθ(θN )]}−1
√NEN [Lθ(θ0)]
•√
NEN [Lθ(θ0)]d→ N(0, I(θ0)) (by CLT)
• EN [Lθθ(θN )]p→ −I(θ0) (by LLN)
then,√
N(θN − θ0)d→ N(0, I(θ0)
−1)
Eduardo Rossi c© - Macroeconometria 07 52
top related