Top Banner
Large Sample Sieve Estimation of S-NP Models Xiaohong Chen; Handbook of Econometrics Chapter 76 Will Matcham [email protected] Will Matcham (LSE) Chapter 76: Chen October 2015 1 / 27
27
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Handbook 76 - Chen

Large Sample Sieve Estimation of S-NP Models

Xiaohong Chen; Handbook of Econometrics Chapter 76

Will Matcham

[email protected]

Will Matcham (LSE) Chapter 76: Chen October 2015 1 / 27

Page 2: Handbook 76 - Chen

Introduction

Sieve Estimation: Examples, Definitions and SievesEmpirical Examples of S-NP ModelsDefinition of Sieve Extremum EstimationTypical Function Spaces and Sieve SpacesSmall Monte Carlo StudySome Sieve Applications in Econometrics

Large Sample Properties of Sieve Estimation of Unknown FunctionsConsistency of Sieve EstimatorsConvergence Rates of M-EstimatorsConvergence Rates of Series EstimatorsPointwise AN of Series LS Estimators

Large Sample Properties of Sieve Estimation of P Parts in S-NP ModelsSP Two-Step EstimatorsSieve Simultaneous M-EstimationSieve Simultaneous MD Estimation

Conclusion

Will Matcham (LSE) Chapter 76: Chen October 2015 2 / 27

Page 3: Handbook 76 - Chen

Abstract

• Parametric (P) models often restrictive and sensitive to deviationsfrom parametric specifications

• Semi-nonparametric (S-NP) models are more flexible and robust, butintroduce other complications: potentially non-compact∞-dimensional parameter spaces, which lead to ill-posed optimisationproblems

• Method of sieves (MoS) provides way to tackle such difficulties

• Optimise an empirical criterion over a sequence of approximatingparameter spaces called sieves

• Sieves are dense in the original space and less complex; optimisationwill become well posed.

• Advantage: MoS very flexible for complex models with or withoutheterogeneity and endogeneity.

Will Matcham (LSE) Chapter 76: Chen October 2015 3 / 27

Page 4: Handbook 76 - Chen

Abstract

• Advantage: MoS can incorporate constraints and information fromtheory: shape restrictions.

• Advantage: MoS can simultaneously estimate parametric andnonparametric (NP) parts with optimal convergence for both.

• Disadvantage: General theory for MoS not complete.

• Chapter describes estimation of S-NP models via MoS

• Will present general results for large sample properties: consistency,convergence rates, pointwise normality, some

√n asymptotic

normality

Will Matcham (LSE) Chapter 76: Chen October 2015 4 / 27

Page 5: Handbook 76 - Chen

Introduction

• mention S-NP, NP and P, and notation used.

Will Matcham (LSE) Chapter 76: Chen October 2015 5 / 27

Page 6: Handbook 76 - Chen

Intro to Section 2

• MoS consists of two key ingredients

1. Criterion Function: population Q : Θ 7→ R (a function); empiricalcriterion Q̂n (a random function)

2. Sieve Parameter Spaces – Sequence of approximating spaces{Θn}n∈N

• Both can be extremely flexible as we shall see. Almost all criterionfunctions in Newey McFadden chapter can be used in MoS

• Hence, main new ingredient is choice of sieve parameter space.

Will Matcham (LSE) Chapter 76: Chen October 2015 6 / 27

Page 7: Handbook 76 - Chen

Empirical Examples of S-NP Models

• Impossible to list all existing S-NP models and their empiricalapplications. Section presents three, I present the first here.

• Example 2.1 (Single spell duration models with unobservedheterogeneity)

• Typical single spell models suggest functional form for structuralduration distribution conditional upon individual heterogeneity. LetG (τ |u, x) be distribution function of duration T conditional uponunobserved and observed heterogeneity U = u and X = x respectively.

• Then modelling U as a random factor with distribution function h(u),obtain

F (τ |x) =

∫G (τ |u, x)dh(u)

• iid sample {Ti ,Xi}ni=1 identifies F .

Will Matcham (LSE) Chapter 76: Chen October 2015 7 / 27

Page 8: Handbook 76 - Chen

Empirical Examples of S-NP Models• Theoretical models provide parametric functional forms of G up to

finite-dimensional β parameter vector.• g(·|β, u, x) density counterpart to G (·|β, u, x)• MLE method assumes hγ known up to finite dimensional γ• Then MLE gives likelihood

n∏i=1

∫g(Ti |β, u,Xi )dhγ(u)

• Thus log likelihood scaled is

L(β, γ) =1

n

n∑i=1

log

{∫g(Ti |β, u,Xi )dhγ(u)

}• And

(β̂MLE , γ̂MLE )′ = argmaxβ,γ

L(β, γ)

Will Matcham (LSE) Chapter 76: Chen October 2015 8 / 27

Page 9: Handbook 76 - Chen

Empirical Examples of S-NP Models

• Heckman and Singer (1984) observe that parametric MLE estimatesof β inconsistent if distribution of unobserved heterogeneity hmisspecified.

• They suggest S-NP single spell model

F (τ |β, h, x) =

∫G (τ |β, u, x)dh(u)

• h left unspecified. (β′, h) is identified and a sieve MLE method givesconsistent estimator for β and h jointly.

• Classic example of S-NP model specifying conditional distribution ofobserved economic variables semi-nonparametrically, with specificsemi-nonparametric form derived from independence of errors andregressors.

Will Matcham (LSE) Chapter 76: Chen October 2015 9 / 27

Page 10: Handbook 76 - Chen

S-NP Conditional Moment Models• Many economic models imply semi-nonparametric conditional

moment restrictions of the form

E [ρ(Zt ; θ0|Xt ] = 0, θ0 =

(β0

h0

)1. ρ column vector of residual functions with functional forms known up

to θ2. {Zt}nt=1 = {(Y ′t ,X ′t )′}nt=1 data, where Yt endogenous, Xt exogenous.3. Worth noting that E [ρ(Zt , θ)|Xt ] denotes conditional expectation of

ρ(Zt , θ) given Xt . True conditional distribution of Yt given Xt leftunspecified.

• Parameters of interest θ0 = (β′0, h′0)′ is split into vector of finite

dimensional unknown parameters β0 and a vector of ∞-dimensionalfunctions h0(·) = (h01(·), . . . , h0q(·))′, which can depend on anythingin the model.

Will Matcham (LSE) Chapter 76: Chen October 2015 10 / 27

Page 11: Handbook 76 - Chen

S-NP Conditional Moment Models• Hansen (1982) studied conditional moment restriction for stationary

ergodic time series without h0, i.e. E [ρ(Zt ;β0|Xt ]• Ai and Chen (2003) and others studied for iid data the general caseE [ρ(Zt ;β0, h0|Xt ]

• Partition S-NP conditional moment restriction models into twosubclasses:

1. Models without endogeneity : ρ(Zt , θ)− ρ(Zt , θ0) doesn’t depend onYt . In such a case, θ0 is the unique maximiser of

Q(θ) = −E(ρ(Zt , θ)′Σ(Xt)

−1ρ(Zt , θ)

)Where Σ(Xt) is pd weight matrix

2. Models with endogeneity : negation of above. Then θ0 identified asunique maximiser of

Q(θ) = −E(m(Xt , θ)′Σ(Xt)

−1m(Xt , θ)

)Where m(Xt , θ) = E [ρ(Zt , θ)|Xt ]

Will Matcham (LSE) Chapter 76: Chen October 2015 11 / 27

Page 12: Handbook 76 - Chen

S-NP Conditional Moment Models

• Although second class includes first class as a special case (trivially),when θ contains unknown functions, asymptotic properties for variousnonparametric estimators of θ are easier to derive in the first case.

• First class contains many well studied special cases, such as thepartially linear regression model of RobinsonE [Yi − X ′

1iβ0 − h0(X2i )|X1i ,X2i ] = 0

• The leading, yet difficult example of the second class is the purelynonparametric instrumental variables regressionE [Y1i − h0(Y2i )|Xi ] = 0.

• Even less trivial, the NP IV quantile regressionE [1 (Y1i ≤ h0(Y2i )− γ) |Xi ] = 0.

Will Matcham (LSE) Chapter 76: Chen October 2015 12 / 27

Page 13: Handbook 76 - Chen

General Setup

• Let Θ be infinite dimensional parameter space, endowed with pseudometric d .1

• Typical S-NP econometric model specifies population criterionQ : Θ 7→ R uniquely maximised at θ0 ∈ Θ. θ0 “true” parameter value.

• Choice of Q and existence of θ are suggested by identification ofeconometric model.

• True θ0 ∈ Θ unknown but related to joint probability measureP0(z1, . . . , zn) from which sample {Zt}nt=1 is available.

• Q̂n : Θ 7→ R is the empirical criterion. For all θ ∈ Θ, Q̂n is ameasurable function of the data. Q̂n is a random function.

• Q̂n converges to Q in some sense as n→∞.1Pseudo metric space (X , d), d : X × X 7→ R with symmetry and triangle inequality

but only d(x , y) ≥ 0, not that x = y ⇐⇒ d(x , y) = 0.

Will Matcham (LSE) Chapter 76: Chen October 2015 13 / 27

Page 14: Handbook 76 - Chen

General Setup

• Generally estimate θ0 by maximising Q̂n over Θ. Assuming it exists,the maximiser argsup

θ∈ΘQ̂n(θ) is called the extremum estimate.

• When Θ infinite dimensional and not compact with respect to d ,maximising Qn over Θ may not be well defined, or even if it exists,may be difficult to compute and have undesirable large sampleproperties.

• Difficulties arise intuitively because problem of optimisation overinfinite dimensional noncompact space is not well posed.

• In ∞ dimensional metric space (H, d), compact set is d-closed andtotally bounded. Set is totally bounded if ∀ε > 0, exists finitely manyopen balls of radius ε covering the set.

Will Matcham (LSE) Chapter 76: Chen October 2015 14 / 27

Page 15: Handbook 76 - Chen

Ill-Posed and Well-Posed Problems

• Optimisation problem well posed if ∀{θk} in Θ such thatQ(θ0)− Q(θk)→ 0, then d(θ0, θk → 0.

• Naturally then problem ill posed if ∃{θk} in Θ whereQ(θ0)− Q(θk)→ 0 yet d(θ0, θk) 6→ 0.]

• For a given S-NP model, suppose that Q and Θ are such that Q isuniquely maximised at θ0 ∈ Θ. Then posedness of the problemdepends on choice of d . Different metrics on ∞ dimensional Θ maynot be equivalent. In finite dimensional space, all norms areequivalent.

• In particular, likely that standard norms ‖ · ‖s on Θ don’t havecontinuity in Q(θ0)− Q(θ). This implies that problem is ill-posedwith s metric. Nevertheless, typically a weaker norm ‖ · ‖w on Θ iscontinuous, hence problem well-posed using this norm.

Will Matcham (LSE) Chapter 76: Chen October 2015 15 / 27

Page 16: Handbook 76 - Chen

Ill-Posed and Well-Posed Problems• No matter whether ill or well-posed, method of sieves provides a

general approach to resolve difficulties with maximising Q̂n over ∞dimensional parameter space by maximising Q̂n over a series ofapproximating spaces Θn called sieves, which are less complex, butdense in Θ

• Sieves are typically compact, nondecreasing and such that ∀θ ∈ Θ,∃πn(θ) ∈ Θn such that d (θ, πn(θ))→ 0 as n→∞. Think that πn isa projection mapping from Θ to Θn.

• Approximate sieve extremum estimate θ̂n is defined as theapproximate maximiser of Q̂n over Θn, i.e.

Q̂n(θ̂n) ≥ supθ∈Θ

Q̂n(θ)− Op(ηn)

ηn = o(1).• When ηn = 0, we have exact sieve extremum estimator

θ̂n = Q̂n(θ)θ∈ΘnWill Matcham (LSE) Chapter 76: Chen October 2015 16 / 27

Page 17: Handbook 76 - Chen

Will Matcham (LSE) Chapter 76: Chen October 2015 17 / 27

Page 18: Handbook 76 - Chen

Will Matcham (LSE) Chapter 76: Chen October 2015 18 / 27

Page 19: Handbook 76 - Chen

Will Matcham (LSE) Chapter 76: Chen October 2015 19 / 27

Page 20: Handbook 76 - Chen

Will Matcham (LSE) Chapter 76: Chen October 2015 20 / 27

Page 21: Handbook 76 - Chen

Will Matcham (LSE) Chapter 76: Chen October 2015 21 / 27

Page 22: Handbook 76 - Chen

Will Matcham (LSE) Chapter 76: Chen October 2015 22 / 27

Page 23: Handbook 76 - Chen

Will Matcham (LSE) Chapter 76: Chen October 2015 23 / 27

Page 24: Handbook 76 - Chen

Will Matcham (LSE) Chapter 76: Chen October 2015 24 / 27

Page 25: Handbook 76 - Chen

Will Matcham (LSE) Chapter 76: Chen October 2015 25 / 27

Page 26: Handbook 76 - Chen

Will Matcham (LSE) Chapter 76: Chen October 2015 26 / 27

Page 27: Handbook 76 - Chen

Will Matcham (LSE) Chapter 76: Chen October 2015 27 / 27