ConsistentLagrangeMultiplierTypeSpeciﬁcationTests ...€¦ · ConsistentLagrangeMultiplierTypeSpeciﬁcationTests forSemiparametricModels∗ Ivan Korolev† Job Market Paper October

Consistent Lagrange Multiplier Type Specification Testsfor Semiparametric Models∗

Ivan Korolev†

Job Market Paper

October 12, 2017(the latest version is available here)

Abstract

This paper considers specification testing in semiparametric econometric models. It de-velops a consistent series-based specification test for semiparametric conditional mean modelsagainst nonparametric alternatives. Consistency is achieved by turning a conditional momentrestriction into a growing number of unconditional moment restrictions using series methods.The test is simple to implement because it requires estimating only the restricted semipara-metric model and because the asymptotic distribution of the test statistic is pivotal. The useof series methods in estimation of the null semiparamertic model allows me to account for theestimation variance and obtain refined asymptotic results. The test remains valid even if othersemiparametric methods are used to estimate the null model as long as they achieve suitableconvergence rates. This includes popular kernel estimators for single index or partially linearmodels. The test demonstrates good size and power properties in simulations. I illustrate theuse of my test with the Canadian gasoline demand example from Yatchew and No (2001) andfind no evidence against the semiparametric specifications used in that paper.

∗I am grateful to my advisors Frank Wolak, Han Hong, and Peter Reiss for their support and guidance,as well as to Svetlana Bryzgalova, Brad Larsen, and Joe Romano for very helpful conversations. I also thankChris Bruegge, Liran Einav, Guido Imbens, Gordon Leslie, Jessie Li, Onder Polat, and Paulo Somaini foruseful comments. I thank Adonis Yatchew for permission to use the Canadian household gasoline consumptiondataset from Yatchew and No (2001). I gratefully acknowledge the financial support from the StanfordGraduate Fellowship Fund as a Koret Fellow and from the Stanford Institute for Economic Policy Researchas a B.F. Haley and E.S. Shaw Fellow. All remaining errors are mine.

†Department of Economics, Stanford University, 579 Serra Mall, Stanford, CA, 94305. E-mail:[email protected]. Website: http://web.stanford.edu/~ikorolev/

1

http://web.stanford.edu/~ikorolev/Ivan_Korolev_JMP.pdf

mailto:[email protected]

http://web.stanford.edu/~ikorolev/

1 Introduction

Applied economists often want to achieve two conflicting goals in their work. On the

one hand, they wish to use the most flexible specification possible, so that their results are

not driven by functional form assumptions. On the other hand, they wish to have a model

that is consistent with the restrictions imposed by economic theory and can be used for valid

counterfactual analysis.

While parametric models are often too restrictive and may not capture heterogeneity in

the data well, nonparametric models may violate restrictions imposed by economic theory

and suffer from the curse of dimensionality, i.e. become imprecise if the dimensionality

of regressors is high. Because of this, and because economic theory usually specifies one

portion of the model but leaves the other unrestricted, semiparametric models are especially

attractive for empirical work in economics. For instance, semiparametric models have been

used in estimation of demand functions (Hausman and Newey (1995), Schmalensee and Stoker

(1999), Yatchew and No (2001)), production functions (Olley and Pakes (1996), Levinsohn

and Petrin (2003)), Engel curves (Gong et al. (2005)), the labor force participation equation

(Martins (2001)), and the relationship between land access and poverty (Finan et al. (2005)).

Because many semiparametric models are restricted versions of fully nonparametric mod-

els, it is important to check the validity of implied restrictions. If semiparametric models

are correctly specified, then using them, as opposed to nonparametric models, typically leads

to more efficient estimates and may increase the range of counterfactual questions that can

be answered using the model at hand. On the other hand, if semiparametric models are

misspecified, then the semiparametric estimates are likely to be misleading and may result

in incorrect policy implications.

In this paper I develop a new specification test which determines whether a semiparamet-

ric conditional mean model that the researcher has estimated provides a statistically valid

description of the data as compared to a general nonparametric model. The test statistic

is based on a quadratic form in the semiparametric model residuals. When the errors are

2

homoskedastic, this quadratic form can be computed as nR2 from the regression of the semi-

parametric residuals on the series approximating functions. Thus, the proposed test is simple

to implement and avoids kernel smoothing in high dimensions. Moreover, the asymptotic dis-

tribution of the test statistic is pivotal, i.e. does not depend on the unknown parameters, so

that calculating asymptotically exact critical values for the test is straightforward and does

not require the use of resampling methods.

The proposed test uses series methods to turn a conditional moment restriction into a

growing number of unconditional moment restriction. I show that if the series functions

can approximate the nonparametric alternatives that are allowed as the sample size grows,

the test is consistent. My assumptions and proofs make precise what is required of the

approximation and its behavior as the number of series terms and the sample size grow.

These arguments differ from standard parametric arguments, when the number of regressors

in the model is fixed.

My asymptotic theory allows both the number of parameters under the null as well as

the number of restrictions to grow with the sample size. By doing so, I show that the

parametric Lagrange Multiplier test can be extended to semiparametric models and serve

as a consistent model specification test for these models. Because series methods have a

projection interpretation, using series methods to nest the null model in the alternative and

estimate the restricted model makes it possible to directly account for the estimation variance

and obtain refined asymptotic results. This refinement, which can be viewed as a degrees of

freedom correction, allows me to derive the asymptotic distribution of the test statistic under

fairly weak rate conditions and leads to very good finite sample performance of the test in

simulations.

Though this refinement is unique to series estimation methods, the proposed test, with a

slight modification, remains valid even if other semiparametric methods, such as kernels or

local polynomials, are used to estimate the null model. Thus, the test applies to a wide class

of semiparametric models, including single index models or partially linear models estimated

3

using the two-step method proposed in Robinson (1988). Because the degrees of freedom

correction is not available in that case, I have to impose more restrictive rate conditions

on the convergence rates of semiparametric estimators, as well as an additional high level

assumption which may be difficult to verify in practice. Moreover, even though the test

statistics for series estimators and for other semiparametric estimators are asymptotically

equivalent, my simulations show that the test based on the latter is undersized and low-

powered in finite samples.

Intuitively, while the test based on series estimation methods uses the projection property

to directly account for the form of the semiparametric residuals, the test based on general

estimation methods only requires the semiparametric residuals to be close to the true errors.

However, in finite samples, there may be substantial difference between the semiparametric

residuals and the true errors, which the latter approach fails to capture. As a result, even

though both approaches are asymptotically valid, the former one yields an accurate approx-

imation of the finite sample distribution of the test statistic, while the latter one does not

work nearly as well.

Specification tests have long played an important role in theoretical econometrics. Several

papers have studied specification testing when the null model contains a nonparametric

component. Early work on specification testing in semiparametric models required certain

ad hoc modifications, such as sample splitting in Yatchew (1992) and Whang and Andrews

(1993) or randomization in Gozalo (1993). Fan and Li (1996) solve this problem and develop

a kernel-based specification test which can be used to test a semiparametric null hypothesis

against a general nonparametric alternative, but their test requires high-dimensional kernel

smoothing and cannot be implemented with standard econometric software. Lavergne and

Vuong (2000) refine the test of Fan and Li (1996), but they only consider significance testing

in nonparametric models. Kernel-based specification tests are also developed in Chen and

Fan (1999), Delgado and Manteiga (2001), Ait-Sahalia et al. (2001), and Bravo (2012). In

all these papers, the asymptotic distribution of the test statistic is quite complicated and

4

requires either estimating several nuisance parameters or using the bootstrap, which can be

computationally costly.

I circumvent this problem by relying on series methods to construct the test statistic.

Because the number of series terms grows with the sample size, the usual asymptotic results

for the parametric Lagrange Multiplier no longer apply. However, it is possible to normalize

the test statistic so that the resulting normalized statistic is asymptotically standard normal.

Therefore, the quantiles of the standard normal distribution can be used as asymptotically

exact critical values for the test.

The proposed test is based on a quadratic form in the restricted model residuals. Thus,

it can be viewed as a nonparametric generalization of the conventional Lagrange Multiplier

test, classical treatments of which include Breusch and Pagan (1980), Engle (1982), and En-

gle (1984). This generalization is not novel in itself, as Hall (1990) and McCulloch and Percy

(2013) also consider Lagrange Multiplier specification tests for parametric against nonpara-

metric models. However, they only consider null hypotheses with fully specified parametric

distributions, while I allow for semiparametric conditional mean models. Moreover, their

asymptotic analysis treats the number of series terms in the alternative model as fixed, and

as a result their tests fail to achieve consistency. In contrast, I develop an asymptotic the-

ory for the case when the number of series terms grows with the sample size and obtain a

consistent specification test.

My work is closely related to the literature on series-based specification tests, such as

de Jong and Bierens (1994), Hong and White (1995), Koenker and Machado (1999), Donald

et al. (2003), and Sun and Li (2006). These papers extend the Conditional Moment test

of Newey (1985) by considering a growing number of unconditional restrictions and thus

achieve consistency. However, they only consider parametric null hypotheses, while my test

can handle a broad class of semiparametric models. Moreover, these papers do not explicitly

develop the degrees of freedom correction. In contrast, I develop this correction for the

case when the semiparametric null model is nested in the nonparametric alternative and

5

estimated using series methods, and I show that it plays a crucial role in the semiparametric

case, because it allows me to weaken the rate conditions and improves the finite sample

performance of the test.

Hong and White (1995), among others, investigate the behavior of the test statistic for

the parametric null hypotheses under the global alternative and under local alternatives. I

repeat this analysis for semiparametric null hypotheses and reach similar conclusions: the

test statistic diverges to infinity at a rate faster than the parametric rate n1/2 under the

global alternative, but the test can only detect local alternatives which approach zero slower

than the parametric rate n−1/2. Moreover, both rates are asymptotically the same as in the

parametric case.1

To my knowledge, there are two studies that develop series-based specification tests that

allow for semiparametric null hypotheses. Gao et al. (2002) consider only additive models

and do not explicitly develop a consistent test against a general nonparametric alternative.

Their test is based on the estimates from the unrestricted model and can be viewed as a Wald

type test for variables significance in nonparametric additive conditional mean models. In

contrast, my test is based on the residuals from the restricted semiparametric model and is

consistent against a broad class of fully nonparametric alternatives for the conditional mean

function.

Li et al. (2003) use the approach that was first put forth in Bierens (1982) and Bierens

(1990) and develop a series-based specification test based on weighting the moments, rather

than considering an increasing number of series-based unconditional moments. Their test

can detect local alternatives which approach zero at the parametric rate n−1/2, but the

asymptotic distribution of their test statistic depends on nuisance parameters, and it is

difficult to obtain appropriate critical values. They propose using a residual-based wild

bootstrap to approximate the critical values. In contrast, my test statistic is asymptotically

standard normal under the null, so calculating the critical values is straightforward.1By saying “asymptotically the same,” I mean that the exact rates in the parametric and semiparametric

cases are different but the ratio of two rates goes to 1 as n→∞.

6

Another attractive feature of the proposed test is that the alternative model does not

have to be fully nonparametric. Because series methods make it easy to impose restrictions

on nonparametric models, the proposed test can be used to test a more restricted semipara-

metric model against a broader semiparametric alternative instead of a fully nonparametric

alternative. For instance, a researcher may be willing to compare a partially linear model

Yi = X ′1iβ + g(X2i) + εi with a varying coefficient model Yi = X ′i1β(X2i) + g(X2i) + εi or an

additive model Yi = h(X1i) + g(X2i) + εi. The proposed test can be modified to handle such

comparisons by considering only the unconditional moments based on the series terms that

are present under the alternative.

Restricting the class of alternatives will result in the loss of consistency against a general

nonparametric alternative, because the semiparametric class of alternatives will be unable

to detect certain deviations from the null. However, this will also increase the test power if

the true model does lie in the conjectured semiparametric class. It is possible to use my test

with several alternatives simultaneously, including semiparametric alternatives to improve

the power in certain directions but also including a general nonparametric alternative to

achieve consistency. The Bonferroni correction can then be used to control the test size. I

show in simulations that this approach leads to higher power when the null hypothesis is false

but the true model is still semiparametric without disturbing the size of the test or losing

consistency against a general class of alternatives.

Finally, similar to the overidentifying restrictions test in GMM models or other omnibus

specification tests, the proposed test is silent on how to proceed if the null is rejected. In this

respect, it is clearly a test of a particular model specification but not a comprehensive model

selection procedure. I plan to study model selection methods for semiparametric and non-

parametric models, such as series-based Bayesian Information Criterion or upward/downward

testing procedures based on the proposed test, in future work.

The remainder of the paper is organized as follows. Section 2 presents a motivating

example from industrial organization. Section 3 introduces the model and describes how to

7

construct the series-based specification test for semiparametric models. Section 4 develops

the asymptotic theory for the proposed test when series methods are used in estimation.

Section 5 extends the asymptotic theory to the case when other semiparametric methods,

such as kernels or local polynomials, are used in estimation. Section 6 studies behavior

of the proposed test in simulations. Section 7 applies the proposed test to the Canadian

household gasoline consumption example from Yatchew and No (2001) and shows that the

semiparametric specifications used in that paper are not rejected. Section 8 concludes.

Appendix A collects all tables and figures. Appendix B provides an intuitive derivation

of the proposed test as well as a step-by-step description of how to implement the proposed

test. Appendix C contains proofs of technical results.

2 Motivating Example

Suppose that a researcher has cross-section data on total costs TCi, output Qi, firm

characteristics Zi, and factor prices pLi and pKi for firms in a given industry, and wants to

estimate a cost function. Any cost function C(Q, pL, pK) has to satisfy certain properties,

such as monotonicity, concavity, and homogeneity of degree 1 in factor prices. Because

nonparametric estimation of the cost function may lead to violation of these restrictions, the

researcher may choose a theory-based parametric functional form of the cost function.

If the researcher assumes a Cobb-Douglas production function for firm i, Qi = AiLαi K

βi ,

under certain assumptions on the unobservables (for details, see Reiss and Wolak (2007)) she

will derive the following relationship:

lnTCi = C1 + γ ln pKi + (1− γ) ln pLi + δ lnQi + εi,

where γ = βα+β

and δ = 1α+β

, and E[εi| ln pKi, ln pLi, Qi, Zi] = 0.

This specification is restrictive because it it does not allow the parameters to vary across

firms, while economic theory usually provides no reason to believe that the parameters should

8

be the same for all firms. Hence, a better alternative may be a semiparametric varying

coefficient model, which allows the input prices and output elasticities to vary flexibly with

observable firm characteristics Zi: γ = γ(z) and δ = δ(z) for some unknown functions

γ(·) and δ(·). Then the labor and capital shares can be recovered from these functions as

α(z) = 1−γ(z)δ(z)

and β(z) = γ(z)δ(z)

. As a result, the total cost relationship will be given by:

lnTCi = C1(Zi) + γ(Zi) ln pKi + (1− γ(Zi)) ln pLi + δ(Zi) lnQi + εi,

This model may fit the data better and be more realistic than the fully parametric model.

Moreover, it can easily satisfy the monotonicity, concavity, and homogeneity restrictions.

However, because the semiparametric model is more restricted than the nonparametric model,

the researcher may want to guard against possible misspecification of the semiparametric

model and check whether it provides a reasonable description of the data. The proposed

test allows the researcher to test the semiparametric model against a general nonparametric

model:

lnTCi = g(lnQi, ln pLi, ln pKi, Zi) + Ui,

with E[Ui| lnQi, ln pLi, ln pKi, Zi] = 0.

3 Model and Specification Test

This section describes the model, discusses semiparametric series estimators, and intro-

duces the test statistic.

3.1 Model and Null Hypothesis

Let (Yi, X′i)′ ∈ R1+dx , dx ∈ N, i = 1, ..., n, be independent and identically distributed

random variables with E[Y 2i ] < ∞. Then there exists a measurable function g such that

9

g(Xi) = E[Yi|Xi] a.s. Then the nonparametric model can be written as

Yi = g(Xi) + εi, E[εi|Xi] = 0

The goal of this paper is to test the null hypothesis that the conditional mean function

is semiparametric. A generic semiparametric null hypothesis is given by

HSP0 : PX (g(Xi) = f(Xi, θ0, h0)) = 1 for some θ0 ∈ Θ, h0 ∈ H, (3.1)

where f : X ×Θ×H → R is a known function, θ ∈ Θ ⊂ Rd is a finite-dimensional parameter,

and h ∈ H = H1 × ...×Hq is a vector of unknown functions.

The global alternative is

H1 : PX (g(Xi) 6= f(Xi, θ, h)) > 0 for all θ ∈ Θ, h ∈ H (3.2)

When the semiparametric null hypothesis is true, the model becomes

Yi = f(Xi, θ0, h0) + εi, E[εi|Xi] = 0

Because the model is semiparametric while the moment condition E[εi|Xi] = 0 is fully

nonparametric, the null model is overidentified and it is possible to test its specification.

In order to do so, I turn the conditional moment restriction E[εi|Xi] = 0 into a sequence

of unconditional moment restrictions using series methods. But first, I introduce series

approximating functions and semiparametric series estimators.

Let P kn(x) = (p1(x), ..., pkn(x))′ be a kn-dimensional vector of approximating functions

of x, where the number of series terms kn grows with the sample size n. Possible choices of

series functions include:

10

(a) Power series. For univariate x, they are given by

P kn(x) = (1, x, ..., xkn−1)′ (3.3)

Multivariate power series can be formed from products of univariate power series (see

Section 5 in Newey (1997)).

(b) Splines. Let s be a positive scalar giving the order of the spline, and let t1, ..., tkn−s−1

denote knots. Then a vector of spline approximating functions for univariate x is given

by (see Section 2 in Donald et al. (2003)):

P kn(x) = (1, x, ..., xs,1x > t1(x− t1)s, ...,1x > tkn−s−1(x− tkn−s−1)s) (3.4)

Multivariate spline approximating functions can be formed from products of univariate

splines.

3.2 Series Estimators

In this section I introduce additional notation and define semiparametric estimators for

the case when series methods are used to estimate the restricted model.2 Suppose that the

researcher writes the fully nonparametric model in the series form:

Yi = g(Xi) + εi ≡ P kn(Xi)′β +Rkn

i + εi,

where Rkni ≡

(g(Xi)− P kn(Xi)

′β)is the approximation error. Rewrite the model as:

Yi = g(Xi) + εi ≡ f(Xi, θ∗, h∗) + (g(Xi)− f(Xi, θ

∗, h∗)) + εi ≡ f(Xi, θ∗, h∗) + d(Xi) + εi,

2For more details and an example, refer to Appendix B.1.

11

where f(Xi, θ∗, h∗) is the semiparametric null model, θ∗ and h∗ are pseudo-true param-

eter values3 that coincide with the true values θ0 and h0 under the null, and d(Xi) ≡

(g(Xi)− f(Xi, θ∗, h∗)) is the misspecification error. The null hypothesis corresponds to

d(Xi) = 0 w.p. 1.

In many cases of interest, it is possible to explicitly separate P kn(x) into two groups of

series terms. The first group, Wmn(Xi) is present both under the null and alternative and

approximates f(Xi, θ, h), while the other group, T rn(Xi), is present only under the alternative

and approximates d(Xi).

Rewrite the model as

Yi = P kn(Xi)′β +Rkn

i + εi ≡ Wmn(Xi)′β1 + T rn(Xi)

′β2 +Rmn1i +Rrn

2i + εi,

where E[εi|Xi] = 0, P kn(Xi) = (Wmn(Xi)′, T rn(Xi)

′)′, β = (β′1, β′2)′, Rkn

i = Rmn1i +Rrn

2i . Here

Rmn1i ≡ (f(Xi, θ

∗, h∗)−Wmn(Xi)′β1) and Rrn

2i ≡ (d(Xi)− T rn(Xi)′β2) are the approximation

errors for two parts of the model.

In addition to kn, which will be required to grow with the sample size to achieve con-

sistency, mn can also grow with the sample size, because the null model is allowed to be

semiparametric. rn = kn − mn is the number of series terms that capture the deviation

from the semiparametric model. This number has to grow with the sample size if it is to

approximate any nonparametric alternative. Typically the number of terms in the restricted

semiparametric model, mn, is significantly smaller than the number of terms in the unre-

stricted nonparametric model, kn, so that rn →∞, mn/kn → 0 and rn/kn → 1.

The null hypothesis corresponds to d(Xi) = 0, which implies β2 = 0 and Rrn2i = 0. The

null model becomes4

Yi = W ′iβ1 +Ri + εi

3By “parameters”, I mean both finite-dimensional parameters θ and unknown functions, or infinite-dimensional parameters, h.

4To simplify notation, I denote Wi ≡Wmn(Xi) and Ri ≡ Rmn1i , and make their dependence on the sample

size implicit. Note that Ri = Rkni under the null.

12

The estimate of β under the null is5

β1 = (W ′W )−1W ′Y,

and the restricted residuals are given by

ε = Y −Wβ1 = Y −W (W ′W )−1W ′Y = MWY = MW (ε+R),

where MW = I −PW , PW = W (W ′W )−1W ′. The residuals satisfy the condition W ′ε/n = 0.

3.3 Test Statistic

Instead of the conditional moment restriction implied by the null hypothesis: E[εi|Xi] =

E[Yi − f(Xi, θ0, h0)|Xi] = 0, the test will be based on the unconditional moment restriction

E[P kn(Xi)εi] = 0. I will show that requiring kn to grow with the sample size and P kn(x)

to approximate any unknown function sufficiently well6 will allow me to obtain a consistent

specification test based on a growing number of unconditional moment restrictions.

The test statistic is be based on the sample analog of the population moment condition

E[P kn(Xi)εi] = 0. In the homoskedastic case, the test statistic resembles the LM test statistic

and is given by

ξ = ε′P (σ2P ′P )−1P ′ε, (3.5)

where P = (P kn(X1), ..., Pkn(Xn))′, ε = (ε1, ..., εn)′, εi = Yi − f(Xi, θ, h) are semiparamet-

ric residuals, and σ2 = ε′ε/n. In the heteroskedastic case, the test statistic is modified

appropriately:

ξHC = ε′P (P ′ΣP )−1P ′ε, (3.6)

5For any vector Vi, let V = (V1, ..., Vn)′ be a matrix which stacks all observations together.6The notion of sufficiently well is made precise later.

13

where Σ = diag(ε2i ).

Because the dimensionality kn of P grows with the sample size, a normalization is needed

to obtain a convergence in distribution result. I show that the test statistic

tτn =ξ − τn√

2τn(3.7)

for a suitable sequence τn →∞ as n→∞ works and is asymptotically pivotal. I will discuss

the choice of τn in the next section.

Remark 1 (Computing the Test Statistic ξ). In the homoskedastic case, there are two simple

regression-based ways to compute ξ.

1. Step 1. Estimate the restricted semiparametric model, obtain the residuals εi.

Step 2. Regress the residuals εi on the series functions P kn(Xi), compute R2 from this

regression. Then ξ = nR2.

2. Step 1. Estimate the semiparametric model using 2SLS, with Wmn(Xi) as regressors

and P kn(Xi) = (Wmn(Xi)′, T rn(Xi)

′)′ as instrumental variables.

Step 2. Obtain the 2SLS residuals εi, compute σ2 = ε′ε/n

Step 3. Compute the overidentifying restrictions test statistic J = ε′P (σ2P ′P )−1P ′ε.

Because in this case the 2SLS estimates are equal to the OLS estimates, ξ = J .

4 Asymptotics with Series Estimation Methods

In this section I derive the asymptotic properties of the proposed test. I focus here

on the case when series methods are used to nest the null model in the alternative and

estimate the restricted model. This is because series methods have a projection interpretation,

which makes it possible to directly account for the estimation variance. As a result, the

rate conditions required for the validity of the proposed test are mild. The use of other

14

semiparametric estimators, such as kernels or local polynomials, does not allow for the degrees

of freedom correction and leads to more restrictive rate conditions. The results on the

asymptotic behavior of the proposed test in a general case are presented in Section 5.

First, I impose basic assumptions on the data generating process.

Assumption 1. (Yi, X′i)′ ∈ R1+dx , dx ∈ N, i = 1, ..., n are i.i.d. random draws of the random

variables (Y,X), and the support of X, X , is a compact subset of Rdx.

Next, I define the error term and impose two moment conditions on it.

Assumption 2. Let εi = Yi − E[Yi|Xi]. The following two conditions hold:

(a) 0 < σ2(x) = E[ε2i |Xi = x] <∞.

(b) E[ε4i |Xi] is bounded.

The following assumption deals with the behavior of the approximating series functions.

From now on, let ||A|| = [tr(A′A)]1/2 be the Euclidian norm of a matrix A.

Assumption 3. (Donald et al. (2003), Assumption 2)

For each k there is a constant scalar ζ(kn) and matrix B such that P k(x) = BP k(x) for

all x ∈ X , supx∈X ||P k(x)|| ≤ ζ(k), E[P k(Xi)Pk(Xi)

′] has smallest eigenvalue bounded away

from zero uniformly in x, and√k ≤ ζ(k).

Remark 2 (ζ(k) for Common Basis Functions). Explicit expressions for ζ(k) are available for

certain families of basis approximating functions. For instance, it has been shown (see, e.g.,

Section 15.1.1 in Li and Racine (2007)) that under additional assumptions, ζ(k) = O(k1/2)

for splines and ζ(k) = O(k) for power series.

The following lemma shows that a convenient normalization can be used. This normal-

ization is typical in the literature on series methods.

Lemma 1. (Donald et al. (2003), Lemma A.2)

If Assumption 3 is satisfied then it can be assumed without loss of generality that P k(x) =

P k(x) and that E[P k(Xi)Pk(Xi)

′] = Ik.

15

Next, I impose an assumption which requires the approximation error for the semipara-

metric model to vanish sufficiently fast under the null.

Assumption 4. Suppose that H0 holds. There exists α > 0 such that

supx∈X|f(x, θ0, h0)−Wmn(x)′β1| = O(m−αn )

The following lemma provides the rates of convergence for semiparametric series estima-

tors under the null hypothesis.

Lemma 2 (Li and Racine (2007), Theorem 15.1). Suppose that H0 holds.. Let g(x) =

f(x, θ, h) = Wmn(x)′β1, gi = g(Xi) = f(Xi, θ0, h0), and gi = g(Xi). Under Assumptions 1,

3, and 4, the following is true:

(a) supx∈X |g(x)− g(x)| = Op

(ζ(mn)(

√mn/n+m−αn )

)(b) 1

n

∑ni=1 (gi − gi)2 = Op(mn/n+m−2αn )

(c)∫

(g(x)− g(x))2dF (x) = Op(mn/n+m−2αn )

Remark 3 (Convergence Rates for Semiparametric Series Estimators). The exact rates given

in Lemma 2 are derived in Li and Racine (2007) for series estimators in nonparametric models.

However, as other examples in Chapter 15 in Li and Racine (2007) show, similar rates can be

derived in a wide class of semiparametric models, such as partially linear, varying coefficient,

or additive models (see Theorems 15.5 and 15.7 in Li and Racine (2007)). In each of these

cases, it is possible to replace the rates in Lemma 2 with the rates for the particular case of

interest. With appropriately modified rate conditions and assumptions, the results developed

below will continue to hold.

Given the assumptions and results above, it is now possible to derive the asymptotic

distribution of the test statistic 3.7 under the null. I separately consider cases when the

errors are homoskedastic and heteroskedastic.

16

4.1 Homoskedastic Case

The limiting distribution of the test statistic under the null is given by the next result:

Theorem 1. Assume that series methods are used to estimate the restricted model. Further,

assume that Assumptions 1, 2, 3, and 4 are satisfied. Moreover, σ2(x) = σ2, 0 < σ2 < ∞,

for all x ∈ X , and the following rate conditions hold:

ζ(kn)2knr1/2n /n→ 0 (4.1)

ζ(rn)rn/n1/2 → 0 (4.2)

ζ(kn)m1/2n k1/2n /n1/2 → 0 (4.3)

nm−2αn /r1/2n → 0 (4.4)

ζ(rn)2/n1/2 → 0 (4.5)

Then

trn =ξ − rn√

2rn

d→ N(0, 1),

where ξ is as in Equation 3.5.

Remark 4 (Rate Conditions under Homoskedasticity). Conditions 4.1–4.5 are sufficient

conditions for the result of the theorem to hold. Conditions 4.1 and 4.2 are used to bound the

error from replacing σ2T ′MWT with σ2E[TiT′i ]. Conditions 4.3 and 4.4 are used to bound the

error from approximating unknown functions with finite series expansions. Condition 4.5 is

used to obtain the convergence in distribution result by applying Lemma A.1 in the Appendix.

Remark 5 (Examples of Permissible Choices of mn, rn, and kn under Homoskedasticity). As

discussed above, typically ζ(x) = O(x1/2) for splines and ζ(x) = O(x) for power series. It can

be shown that for splines, the rates kn = O(n2/7), rn = O(n2/7), mn = O(n1/4) are permissible

in the sense of rate conditions 4.1–4.5 if α ≥ 4. For power series, the rates kn = O(n2/9),

rn = O(n2/9), mn = O(n1/5) are permissible in the sense of rate conditions 4.1–4.5 if α ≥ 5.

17

The following corollary shows that using a χ2 approximation with a growing number of

degrees of freedom directly also results in an asymptotically exact test in the homoskedastic

case.

Corollary 1. If the conditions of Theorem 1 hold, then

P(ξ ≥ χ2

rn(1− α))

= P

(ξ − rn√

2rn≥χ2rn(1− α)− rn√

2rn

)→ α

Thus, there are two ways to construct an asymptotically exact specification test. Exactly

which one might be preferred in terms of the finite sample performance is studied in Section 6.

4.2 Heteroskedastic Case

The limiting distribution of the heteroskedasticity robust test statistic under the null is

given by the next result:


assume that Assumptions 1, 2, 3, and 4 are satisfied. Moreover, the following rate conditions

hold:

(mn/n+m−2αn )ζ(rn)2r1/2n → 0 (4.6)

ζ(rn)rn/n1/2 → 0 (4.7)

ζ(kn)m1/2n k1/2n /n1/2 → 0 (4.8)

nm−2αn /r1/2n → 0 (4.9)

ζ(rn)2/n1/2 → 0 (4.10)

Also assume that ||Ω− Ω|| = op(r−1/2n ), where Ω = T ′ΣT/n and

Ω = T ′ΣT/n− (T ′ΣW/n)(W ′ΣW/n)−1(W ′ΣT/n)

18

Then

tHC,rn =ξHC − rn√

2rn

d→ N(0, 1),

where ξHC is as in Equation 3.6.

Remark 6 (Rate Conditions under Heteroskedasticity). Conditions 4.6–4.10 are sufficient

conditions for the result of the theorem to hold. Conditions 4.6 and 4.7 are used to bound the

error from replacing T ′ΣT/n with E[ε2iTiT′i ].7 Conditions 4.8 and 4.9 are used to bound the

error from approximating unknown functions with finite series expansions. Condition 4.10 is

used to obtain the convergence in distribution result by applying Lemma A.1 in the Appendix.

Remark 7 (Examples of Permissible Choices of mn, rn, and kn under Heteroskedasticity). It

can be shown that the same rates as in Remark 5 (kn = O(n2/7), rn = O(n2/7), mn = O(n1/4)

for splines if α ≥ 4; kn = O(n2/9), rn = O(n2/9), mn = O(n1/5) for power series if α ≥ 5) are

permissible in the sense of rate conditions 4.6–4.10.

However, it is not clear whether these rates satisfy the high-level assumption ||Ω− Ω|| =

op(r−1/2n ). It is possible that this assumption imposes stronger rate restrictions which are

difficult to verify.

The following corollary shows that using a χ2 approximation with a growing number of

degrees of freedom directly also results in an asymptotically exact test in the heteroskedastic

case.

Corollary 2. If the conditions of Theorem 2 hold, then

P(ξHC ≥ χ2

rn(1− α))

= P

(ξHC − rn√

2rn≥χ2rn(1− α)− rn√

2rn

)→ α

7Instead of imposing primitive conditions under which Ω can be replaced with Ω, I directly impose therequirement that ||Ω − Ω|| = op(r

−1/2n ). This is a high level condition, but it is difficult to derive sufficient

primitive conditions.

19

4.3 Behavior of Test Statistic under a Global Alternative

In this subsection, I study the behavior of the test statistic under a global alternative.

To obtain a consistent specification test, I will rely on the following result from Donald

et al. (2003) that shows that the conditional mean restriction E[εi|Xi] = 0 is equivalent to a

sequence of unconditional moment restrictions.

Assumption 5. (Donald et al. (2003), Assumption 1)

Assume that E[P k(Xi)Pk(Xi)

′] is finite for all k, and for any a(x) with E[a(Xi)2] < ∞

there are k × 1 vectors γk such that, as k →∞,

E[(a(Xi)− P k(Xi)′γk)

2]→ 0

Lemma 3. (Donald et al. (2003), Lemma 2.1)

Suppose that Assumption 5 is satisfied and E[ε2i ] is finite. If E[εi|Xi] = 0 then E[P k(Xi)εi] =

0 for all k. Furthermore, if E[εi|Xi] 6= 0 then E[P k(Xi)εi] 6= 0 for all k large enough.

This is a population result in the sense that it does not involve the sample size n. In

order to use this result in practice, I require the number of series terms used to construct the

test statistic, kn, to grow with the sample size. By doing so, I ensure that the unconditional

moment restriction E[P kn(Xi)εi] = 0, on which the test is based, is equivalent to the condi-

tional moment restriction E[εi|Xi]. Thus, the test will be consistent against a wide class of

general nonparametric alternatives.

In order to analyze the behavior of the test under a global alternative, I introduce some

notation first. The true model is nonparamertic:

Yi = g(Xi) + εi, E[εi|Xi] = 0

20

An alternative way to write this model is

Yi = f(Xi, θ∗, h∗) + ε∗i ,

where θ∗ and h∗ are pseudo-true parameter values and ε∗i ≡ εi + (g(Xi) − f(Xi, θ∗, h∗)) ≡

εi + d(Xi) is a composite error term. The pseudo-true parameter values minimize

E[(g(Xi)− f(Xi, θ, h))2]

over a suitable parameter space, and I assume that the semiparametric estimates under

misspecification are consistent for the pseudo-true values.

Note that the model can be written as

Yi = Wmn(Xi)′β∗1 + ε∗i +R∗i ,

where R∗i ≡ (f(Xi, θ∗, h∗) −Wmn(Xi)

′β∗1). The pseudo-true parameter value β∗1 solves the

moment condition E[Wmn(Xi)(Yi−Wmn(Xi)′β∗1)] = 0, and the semiparametric estimator β1

solves its sample analog W ′(Y −W ′β1)/n = 0.

The following theorem provides the divergence rate of the test statistic under the global

alternative.

Theorem 3. Assume that series methods are used in estimation. Let Ω∗ = E[ε∗2i TiT′i ]. In

the homoskedastic case, let Ω = σ2T ′MWT/n, and in the heteroskedastic case, let

Ω = T ′ΣT/n− (T ′ΣW/n)(W ′ΣW/n)−1(W ′ΣT/n),

where Σ = diag(ε2i ).

Suppose that supx∈X |f(x, θ∗, h∗)−Wmn(x)′β∗1 | → 0, ||Ω − Ω∗|| p→ 0, the smallest eigen-

value of Ω∗ is bounded away from zero, mn →∞, rn →∞, rn/n→ 0, E[ε∗iT′i ]Ω∗−1E[Tiε

∗i ]→

21

∆, where ∆ is a constant. Then under homoskedasticity

√rnn

ξ − rn√2rn

p→ ∆/√

2,

and under heteroskedasticity √rnn

ξHC − rn√2rn

p→ ∆/√

2

The divergence rate of the test statistic under the alternative is n/√rn. However, in

most cases, the restricted semiparametric model is of lower dimension than the unrestricted

nonparametric model, so that mn/kn → 0 and rn/kn → 1. Thus, the divergence rate in the

semiparametric case discussed here is the same as in the parametric case in Hong and White

(1995) and Donald et al. (2003), so the fact that the null hypothesis is semiparametric does

not affect the global power of the test.

4.4 Behavior of Test Statistic under Local Alternatives

In this section, I analyze the power of the proposed test against local alternatives of the

form

H1n : gn(Xi) = f(Xi, θ∗, h∗) + (r1/4n /n1/2)d0(Xi),

where d0 is square integrable on X , E[d0(Xi)] = 0, and E[f(Xi, θ∗, h∗)d0(Xi)] = 0. The

null hypothesis corresponds to d0 = 0. I need to slightly modify the assumptions from the

precious sections.

First, I impose the following condition. It is not primitive; however, an analogous result

for ε instead of d is derived in the proof of Theorem 1. Thus, this assumption is likely to

hold under the primitive conditions I impose.

Assumption 6. Assume that ||n−1T ′d|| = Op(√rn/n) and ||n−1W ′d|| = Op(

√rn/n).

Next, I impose the following assumption which requires the basis functions T rn(Xi) to

approximate d0(Xi) sufficiently well. Note that because d0(Xi) is orthogonal to the semipara-

22

metric part of the model, f(Xi, θ∗, h∗), it is also orthogonal to Wmn(Xi), and only T rn(Xi)

is needed to approximate it.

Assumption 7. There exists αd > 0 such that

supx∈X|d0(x)− T rn(x)′π| = O(r−αd

n )

Next, I apply the following result to obtain the convergence rates of series estimators

of d0(x). These estimators are infeasible, because in practice d0(x) is unknown, but these

convergence rates will be used in the proofs.

Lemma 4 (Li and Racine (2007), Theorem 15.1). Let d(x) = T rn(x)′π, π = (T ′T )−1T ′d

di = d0(Xi), and di = d(Xi). Under Assumptions 1, 3, and 7, the following is true:

(a) supx∈X |d(x)− d0(x)| = Op

(ζ(rn)(

√rn/n+ r−αd

n ))

(b) 1n

∑ni=1 (di − di)2 = Op(rn/n+ r−2αd

n )

(c)∫

(d(x)− d0(x))2dF (x) = Op(rn/n+ r−2αdn )

Next, I impose an assumption which parallels Assumption 4.

Assumption 8. There exists α > 0 such that

supx∈X|f(x, θ∗, h∗)−Wmn(x)′β∗1 | = O(m−αn )

Now I impose an assumption which mimics the results of Lemma 2 and requires the

semiparametric estimators to converge to the pseudo-true values under the local alternative

at the same rates as to the true values under the null.

Assumption 9. Let f(x) = f(x, θ, h) = Wmn(x)′β1, f ∗(x) = fn(x, θ∗, h∗), fi = f(Xi), and

f ∗i = f ∗(Xi). For some α > 0, the following conditions hold:

(a) supx∈X |f(x)− f ∗(x)| = Op

(ζ(mn)(

√mn/n+m−αn )

)23

(b) 1n

∑ni=1 (fi − f ∗i )2 = Op(mn/n+m−2αn )

(c)∫

(f(x)− f ∗(x))2 = Op(mn/n+m−2αn )

The next result gives the behavior of the test under local alternatives. For simplicity, I

only treat the homoskedastic case, but a similar result can be obtained in the heteroskedastic

case at the expense of additional, less transparent, assumptions and tedious derivations.


assume that Assumptions 1, 2, 3, 6, 7, 8, and 9 are satisfied. Moreover, σ2(x) = σ2,

0 < σ2 <∞, for all x ∈ X , and rate conditions 4.1–4.5 hold.

Then

trn =ξ − rn√

2rn

d→ N(δ, 1),

where ξ is as in Equation 3.5 and δ = E[d2i ]/σ2.

5 Asymptotics with General Semiparametric Methods

One may expect that the same type of specification tests, based on the quadratic form

ξ = ε′P (σ2P ′P )−1P ′ε, may remain valid even if other semiparametric methods, such as

kernels, are used to estimate the restricted model, or if the null model is not nested in the

alternative. In this section I develop the asymptotic theory for the proposed test in this

general case.

Because general semiparametric methods, unlike series methods, make it difficult to ex-

plicitly nest the restricted model in the unrestricted one, rn, the number of restrictions

imposed by the null hypothesis on a general nonparametric model, is undefined. Thus, the

only possible normalization in this case is τn = kn, the number of series terms used to con-

struct the test statistic. Moreover, because general semiparametric methods do not have a

projection interpretation, it is problematic to directly account for the estimation variance

24

and derive a degrees of freedom correction. Thus, I will have to impose stronger assumptions

than in the series estimation case.

First, I impose a generic assumption on the convergence rate of the semiparametric con-

ditional mean function estimator to the true conditional mean function. In many semipara-

metric models, θ is√n-consistent and h obeys certain convergence rates, so semiparametric

estimators often satisfy this assumption for appropriately chosen ηn and ψn.

Assumption 10. Let g(x) = f(x, θ, h), gi = g(Xi), and gi = g(Xi). For some ηn → 0 and

ψn → 0 as n→∞, the following conditions hold:

(a) supx∈X |g(x)− g(x)| = Op(ηn)

(b) 1n

∑ni=1 (gi − gi)2 = Op(ψn)

(c)∫

(g(x)− g(x))2dF (x) = Op(ψn)

Next, I impose a high-level assumption on how the error term in the model interacts with

the estimation error:

Assumption 11. Assume that

∑i

εi(gi − gi) = op(k1/2n )

This assumption is used to bound the difference between the quadratic form in the semi-

parametric residuals, ε′P (σ2P ′P )−1P ′ε, and the quadratic form in the true regression errors,

ε′P (σ2P ′P )−1P ′ε. While this is a straightforward task in the parametric case or in the semi-

parametric case with series estimation methods, this is a much more involved task in the

general case. The primitive conditions for this assumption may differ depending on a semi-

parametric null model and a particular method used to estimate it. I plan to rigorously derive

the primitive conditions for important special cases in future work. In the current version

of the paper, I justify this assumption by providing an outline of the proof for leave-one-out

kernel estimators in Remark A.2 in the Appendix.

25

As before, I separately treat the homoskedastic and heteroskedastic cases.

5.1 Homoskedastic Errors

The limiting distribution of the test statistic under the null is given by the next result:

Theorem 5. Assume that Assumptions 1, 2, 3, 10, and 11 are satisfied. Moreover, σ2(x) =

σ2, 0 < σ2 <∞, for all x ∈ X , and the following rate conditions hold:

ψnk3/2n → 0 (5.1)

ζ(kn)kn/n1/2 → 0 (5.2)

nψn/k1/2n → 0 (5.3)

ζ(kn)2/n1/2 → 0 (5.4)

Then

tkn =ξ − kn√

2kn

d→ N(0, 1),

where ξ is as in Equation 3.5.

Remark 8 (Rate Conditions for General Semiparametric Estimators under Homoskedas-

ticity). Conditions 5.1–5.4 are sufficient conditions for the result of the theorem to hold.

Conditions 5.1 and 5.2 are used to bound the error from replacing σ2P ′P/n with σ2E[PiP′i ].

Condition 5.3 is used to bound the error from replacing ε with ε. Condition 5.4 is used to

obtain the convergence in distribution result by applying Lemma A.7 in the Appendix.

Theorem 5 does not explicitly account for the estimation error (ε−ε); instead, it imposes

the rate conditions which make it asymptotically negligible. As a result, the rate conditions

imposed in Theorem 5 are substantially stronger than in Theorem 1. Intuitively, because

Theorem 5 does not directly account for the form of the semiparametric estimator, it has

to impose strong restrictions on its convergence rate, so that asymptotically the estimation

error is negligible.

26

Remark 9 (Examples of Permissible Choices of mn and kn under Homoskedasticity in Gen-

eral Case). Suppose that series methods are used for estimation but are paired with rate

conditions 5.1–5.4 instead of 4.1–4.5. Then the rates given in Remark 5 will not work. It can

be shown that, for splines, kn = O(n2/7) would require mn = o(n7) and α ≥ 7. For example,

kn = O(n2/7) and mn = O(n2/15) are permissible in the sense of rate conditions 5.1–5.4 if

α ≥ 7. For power series, kn = O(n2/9) would require mn = o(n9) and α ≥ 9. For example,

kn = O(n2/7) and mn = O(n2/19) are permissible in the sense of rate conditions 5.1–5.4 if

α ≥ 9.

5.2 Heteroskedastic Errors

Now I deal with heteroskedastic errors. The limiting distribution of the heteroskedasticity

robust test statistic under the null is given by the next result:

Theorem 6. Assume that Assumptions 1, 2, 3, 10, and 11 are satisfied. Moreover, the

following rate conditions hold:

ψnk1/2n ζ(kn)2 → 0 (5.5)

ζ(kn)kn/n1/2 → 0 (5.6)

nψn/k1/2n → 0 (5.7)

ζ(kn)2/n1/2 → 0 (5.8)

Then

tHC,kn =ξHC − kn√

2kn

d→ N(0, 1),

where ξHC is as in Equation 3.6.

Remark 10 (Rate Conditions for General Semiparametric Estimators under Heteroskedas-

ticity). Conditions 5.5–5.8 are sufficient conditions for the result of the theorem to hold.

Conditions 5.5 and 5.6 are used to bound the error from replacing σ2P ′P/n with σ2E[PiP′i ].

27

Condition 5.7 is used to bound the error from replacing ε with ε. Condition 5.8 is used to

obtain the convergence in distribution result by applying Lemma A.7 in the Appendix.

5.3 Behavior of Test Statistic under a Global Alternative

In this section I consider the same global alternative as in Section 4. It is given by

Yi = g(Xi) + εi = f(Xi, θ∗, h∗) + ε∗i ,

where θ∗ and h∗ are pseudo-true parameter values and ε∗i ≡ εi + (g(Xi) − f(Xi, θ∗, h∗)) ≡

εi+d(Xi) is a composite error term. As before, let θ and h be the semiparametric estimators.

Then the following result holds:

Theorem 7. Let Ω∗ = E[ε∗2i PiP′i ]. In the homoskedastic case, let Ω = σ2P ′P/n, and in the

heteroskedastic case, let Ω = P ′ΣP/n, where Σ = diag(ε2i ).

Suppose that supx∈X |f(x, θ, h)− f(x, θ∗, h∗)| p→ 0, ||Ω−Ω∗|| p→ 0, the smallest eigenvalue

of Ω∗ is bounded away from zero, kn →∞, kn/n→ 0, E[ε∗iP′i ]Ω∗−1E[Piε

∗i ]→ ∆, where ∆ is

a constant. Then under homoskedasticity

√knn

ξ − kn√2kn

p→ ∆/√

2,

and under heteroskedasticity √knn

ξHC − kn√2kn

p→ ∆/√

2

The result of Theorem 7 is very similar to the result of Theorem 3. Given that in most

cases of interest mn/kn → 0, rn/kn → 1, the divergence rates of the statistics trn and tkn ,

n/√rn and n/

√kn respectively, are asymptotically equivalent. However, Theorem 3 achieves

this result under weaker assumptions, as it only requires the series approximation error to

go to zero without explicitly requiring the semiparametric estimator to be consistent. In

contrast, Theorem 7 requires the semiparametric estimator f(x, θ, h) to be consistent for the

28

pseudo-true value f(x, θ∗, h∗).

5.4 Behavior of Test Statistic under Local Alternatives

In this section I consider a slightly different family of local alternatives as compared to

Section 4. It is given by

H1n : gn(Xi) = f(Xi, θ∗, h∗) + (k1/4n /n1/2)d0(Xi),

where d0 is square integrable on X , E[d0(Xi)] = 0, and E[f(Xi, θ∗, h∗)d0(Xi)] = 0. The null

hypothesis corresponds to d0 = 0. I impose the following assumptions.

First, I assume that the unknown function d(x) can be approximated by a finite series

expansion sufficiently well.

Assumption 12. There exists αd > 0 such that

supx∈X|d0(x)− P kn(x)′π| = O(k−αd

n )

Next, I apply the following result to obtain the convergence rates of series estimators

of d0(x). These estimators are infeasible, because in practice d0(x) is unknown, but these

convergence rates will be used in the proofs.

Lemma 5 (Li and Racine (2007), Theorem 15.1). Let d(x) = P kn(x)′π, π = (P ′P )−1P ′d

di = d0(Xi), and di = d(Xi). Under Assumptions 1, 3, and 12, the following is true:

(a) supx∈X |d(x)− d0(x)| = Op

(ζ(kn)(

√kn/n+ k−αd

n ))

(b) 1n

∑ni=1 (di − di)2 = Op(kn/n+ k−2αd

n )

(c)∫

(d(x)− d0(x))2dF (x) = Op(kn/n+ k−2αdn )

I modify Assumption 10 as follows:

29

Assumption 13. Let f(x) = f(x, θ, h), f ∗(x) = f(x, θ∗, h∗), fi = f(Xi), and f ∗i = f ∗(Xi).

For some ηn → 0 and ψn → 0 as n→∞, the following conditions hold:

(a) supx∈X |f(x)− f ∗(x)| = Op(ηn)

(b) 1n

∑ni=1 (fi − f ∗i )2 = Op(ψn)

(c)∫

(f(x)− f ∗(x))2 = Op(ψn)

Assumption 13 requires the semiparametric estimators to converge to the pseudo-true

values fast enough. Next, I impose a high-level assumption on how the error term in the

model interacts with the estimation error. It parallels Assumption 11 and is discussed in

Remark A.2 in the Appendix.

Assumption 14. Assume that

∑i

εi(fi − fi) = op(k1/2n )

The next result gives the behavior of the test under local alternatives:

Theorem 8. Assume that Assumptions 1, 2, 3, 12, 13, and 14 are satisfied. Moreover,

σ2(x) = σ2, 0 < σ2 <∞, for all x ∈ X , and rate conditions 5.1–5.4 hold.

Then

tkn =ξ − kn√

2kn

d→ N(δ, 1),

where ξ is as in Equation 3.5 and δ = E[d2i ]/σ2.

5.5 Asymptotic Theory: Summary

To summarize, there are two versions of the proposed test which are asymptotically equiv-

alent. One relies on series estimation method to define the number of restrictions imposed by

the semiparametric model on a general nonparametric model and uses the projection inter-

pretation of series estimators to derive the degrees of freedom correction. Another one uses

30

general estimation methods, without imposing the series structure and defining the number

of restrictions. While the former approach restricts the class of models to which the test

applies to the models that can be estimated by series, it allows me to obtain refined asymp-

totic results under mild rate conditions. In contrast, the latter approach is applicable to a

broader range of models, but it results in cruder asymptotic analysis and requires stronger

assumptions.

These two approaches differ because they differently cope with a key step in the proof,

going from the semiparametric regression residuals ε to the true errors ε. The series-based

approach relies on the projection property of series estimators to eliminate the estimation

variance and hence only has to deal with the approximation bias. Specifically, it uses the

equality ε = MW ε + MWR, applies a central limit theorem for U -statistics to the quadratic

form in MW ε, and bounds the remainder terms by requiring the approximation error R to

be small.

The general approach does not impose any special structure on the model residuals and

thus has to deal with both the bias and variance of semiparametric estimators. Specifically,

it uses the equality ε = ε+ (g − g), which can be written in the series form8 as ε = ε+R+

W ′(β1 − β1), applies a central limit theorem for U -statistics to the quadratic form in ε, and

bounds the remainder terms by requiring both the bias term R and variance termW ′(β1− β1)

to be small. This, in turn, requires the semiparametric estimates of the restricted model to

converge to the true values very fast and leads to restrictive rate conditions.

6 Simulations

There are several variants of the test, depending on whether series methods are used in

estimation, what normalization is used, and what limiting distribution is used. In this section

I study the finite sample performance of different variants of the proposed tests in simulations8Even though series methods do not have to be used in estimation, it is convenient to write the model in

the series form to facilitate the comparison between two approaches.

31

that mimic the cost estimation example in Section 2. The Monte Carlo studies have several

goals: first, to compare the tests based on the χ2 and normal asymptotic approximations;

second, to compare two normalizations of the test statistic; third, to investigate the test

behavior with different sample sizes and with different numbers of series terms; finally, to

study the use of multiple alternatives as a tool to improve the test power in particular

directions.

I assume that the researcher wants to estimate returns to scale (or their inverse δ(Zi)) in

the model

(lnTCi − ln pLi)︸︷︷︸TCi

= C1(Zi) + γ(Zi) (ln pKi − ln pLi)︸︷︷︸pi

+δ(Zi) lnQi︸︷︷︸Qi

+εi

From now on, I consider the rearranged model TCi = C1(Zi) + γ(Zi)pi + δ(Zi)Qi + εi.

In the subsequent analysis, I will test the specification of the semiparametric conditional

mean model against the nonparametric alternative using the proposed specification test.

My test also applies to parametric null hypotheses, but I omit the results of testing the

parametric model against the nonparametric alternative for brevity. The varying coefficient

null hypothesis is given by

HSP0 : P

(E[TCi|pi, Qi, Zi] = C1(Zi) + γ(Zi)pi + δ(Zi)Qi

)= 1 for some C1(Zi), γ(Zi), δ(Zi),

while the alternative is

H1 : P(E[TCi|pi, Qi, Zi] 6= C1(Zi) + γ(Zi)pi + δ(Zi)Qi

)> 0 for all C1(Zi), γ(Zi), δ(Zi)

I obtain (ln pLi, ln pKi, lnQi, Zi) as follows. First, I draw four uniform random variables

32

Vj,i ∼ U [0, 2], j = 1, ..., 4. Then I set

ln pKi = 1.25 + 0.75(0.7V1,i + 0.15V2,i + 0.1V3,i + 0.05V4,i)

ln pLi = 1.4 + 0.6(0.1V1,i + 0.75V2,i + 0.05V3,i + 0.1V4,i)

lnQi = 2 + (0.05V1,i + 0.1V2,i + 0.65V3,i + 0.2V4,i)

Zi = 1.25 + 0.75(0.025V1,i + 0.025V2,i + 0.15V3,i + 0.8V4,i)

This way of generating data ensures that all variables are bounded and at the same time

correlated with one another. Table 1 shows the pairwise correlations between the variables.

Then I simulate data from two data generating processes:

1. Semiparametric varying coefficient model, which corresponds to the null hypothesis

HSP0 :

TCi = C1(Zi) + γ(Zi)pi + δ(Zi)Qi + εi, (6.1)

where

C1(z) = 20− 3(z − 2) + 7(z − 2)2 − 12(z − 2)3

α(z) = 0.35 + 0.05 exp(2(z − 2)) + 0.05(z − 2)2

β(z) = 0.3 + 0.075 exp(2(z − 2)) + 0.025(z − 2)2

γ(z) =β(z)

α(z) + β(z), δ(z) =

1

α(z) + β(z)

Figure 1 plots the coefficient functions C1(z), γ(z), and δ(z). This choice of functional

forms is motivated by the fact that it results in reasonable and economically interesting

parameter values. For instance, returns to scale lie roughly between 0.7 and 1.2, with

more than 90% of the firms having decreasing returns to scale and less that 10% having

increasing returns to scale. The firms with increasing returns to scale are those with

33

the highest R&D spending.

2. Nonparametric model, which corresponds to the alternative hypothesis H1:

TCi = C1(Zi) + γ(Zi)pi + δ(Zi)Qi + λ1|pi|1/2 + λ2|Qi|1/3 + εi, (6.2)

where (λ1, λ2) = (0.65, 0.65).

Figures 3 and 4 plot the relationship between the total costs and prices or output under

the null and under the alternative to illustrate how the nonparametric model compares

to the semiparametric one.

Though the deviation from the null is nontrivial, it explains only a small fraction of the

dependent variable variation. The variance of the dependent variable TCi is around

8.25, while the variance of the semiparametric part C1(Zi)+γ(Zi)pi+δ(Zi)Qi is around

6.10 and the variance of the deviation from the null λ1|pi|1/2 +λ2|Qi|1/3 is around 0.03.

Because the parametric model and the semiparametric models are restricted versions of

the nonparametric model, I will use the proposed test to check if they are correctly specified.

But first, I demonstrate why specification testing is important.

Suppose that the true model is the varying coefficient model above, but the researcher

estimates the parametric model:

TCi = C1 + γpi + δQi + θZi + εi,

or the parametric model with interactions:

TCi = C1 + (γ0 + γ1Zi)pi + (δ0 + δ1Zi)Qi + θZi + εi

Figure 2 shows the estimates of δ(z), the inverse of returns to scale, from OLS, OLS with

interactions, and the varying coefficient model, as well as the true function δ(z). As we can

34

see from the figure, both ordinary OLS, and OLS with interactions, yield very misleading

estimates of δ(z) and thus returns to scale. The true δ(z) is greater than 1 and slowly

decreasing over most of its support, meaning that most firms in the sample have decreasing

returns to scale. The OLS results imply that it δ(z) is smaller than 1 and hence returns to

scale are increasing. The OLS with interactions results imply that δ(z) is increasing in R&D

spending, so not only the magnitude of returns to scale is incorrect, but also the pattern of

their dependence on R&D spending. If the researcher wants to evaluate a possible merger

of two firms, OLS or OLS with interactions estimates will likely lead to very misleading

counterfactuals. Thus, using a plausible model is important, and I will show next that the

proposed test helps determine if a given model is correctly specified.

Before I move on and discuss the behavior of the proposed test in finite samples, I need

to make two choices in order to implement the test. First, I need to choose the family of

basis functions; second, choose the number of series terms in the restricted and unrestricted

models. I use power series because of their simplicity, but I do not have any data-driven

methods to choose tuning parameters.

The choice of tuning parameters presents a big practical challenge in implementing the

proposed test, as well as many other specification tests.9 If one is interested only in estimating

the null or the alternative models, then certain data-driven methods, such as cross-validation,

can be used to select the number of series terms. For details, see, e.g., Section 15.2 in Li

and Racine (2007). However, it is not clear how these data-driven procedures may affect

the proposed test and whether using them would lead to any optimality results in testing.

I leave the choice of tuning parameters for the proposed test for future research, and in my

simulations choose tuning parameters according to the rate conditions imposed in Section 4.

However, the choices I make are still arbitrary, because I can multiply mn or kn by any

constant and still satisfy the rate conditions, which are asymptotic in nature.

The number of terms in the parametric model is mOLSn = 4, the number of regressors

9For a discussion of regularization parameters choice in the context of kernel-based tests, see a review bySperlich (2014).

35

plus one. The number of terms in the series expansions of the unknown coefficient functions

C1(z), γ(z), and δ(z) in the varying coefficient model is ln = b2.5n0.12c, which leads to

mV CMn = 3ln. In the nonparametric model, I include jn = b3n0.06c series terms in pi and Qi

each, which leads to kn = lnj2n. The number of restrictions is given by rOLSn = kn − mOLS

n

and rV CMn = kn −mV CMn correspondingly.

These choices lead to mn = 15, rn = 65, and kn = 80 when n = 1, 000, and to mn = 18,

rn = 132, and kn = 150 when n = 5, 000. Because the behavior of the test statistic depends

both on the sample size and the number of series terms, I also consider an intermediate

setup with n = 5, 000 but mn = 15, rn = 65, and kn = 80 to separate these two effects.

In other words, I first fix the number of series terms and increase the sample size, and then

fix the sample size and increase the number of series terms. Throughout my analysis, I use

B = 2, 000 simulation draws.

6.1 Homoskedastic Errors

First, I investigate the performance of the test when the errors are homoskedastic. I use

the ξ test statistic directly:

ξ = ε′P (σ2P ′P )−1P ′εa∼ χ2

τn ,

as well as the t statistic:

tτn =ξ − τn√

2τn

a∼ N(0, 1),

with τn = kn and τn = rn.

I consider two sample sizes, n = 1, 000 and n = 5, 000. The errors are normally dis-

tributed: εi ∼ i.i.d. N(0, 2.25). With this choice of the distribution of εi, the semiparametric

part of the model explains about 70% of the dependent variable variance, while the errors

account for the remaining 30%. I repeated my analysis with centered exponential errors,

which have an asymmetric distribution, and found no substantial difference from the normal

case. The results for exponential errors are omitted for brevity.

36

Table 2 shows the size and power of the nominal 5% level test for three combinations of

the sample size and the number of series terms: Setup 1 with (n = 1, 000, kn = 80), Setup 2

with (n = 5, 000, kn = 80), and Setup 3 with (n = 5, 000, kn = 150), and two normalizations

of the test statistic: τn = kn and τn = rn. As we can see from the table, the test with

the former normalization is severely undersized, with the size being below 1% even with

n = 5, 000 observations. In contrast, the size of the test with the latter normalization is

close to the nominal level of 5%. Moreover, in all three settings, the test based on the latter

normalization has better power against the semiparametric varying coefficient null model

when the nonparametric model is true.

As far as the choice between the ξ test statistic and the normalized t test statistic is

concerned, the latter typically leads to slightly higher rejection probabilities, meaning that

the test based on the t statistic has marginally better power but is slightly oversized.

Finally, we can see that increasing the sample size from n = 1, 000 to n = 5, 000 while

keeping the series term constant at kn = 80 brings the size of the test closer to the nominal

level and greatly increases power, while increasing the number of series terms from kn = 80

to kn = 150 with the sample size of n = 5, 000 has almost no effect on the size but reduces

the power of the test. This observation calls for a data-driven method to choose the number

of series terms. As my simulations show, including too many series terms when they are not

necessary can worsen the test performance; however, including too few series terms can make

it difficult to detect certain alternatives10, which will also lead to low power. Thus, having

a data-driven method that would help balance these two considerations and determine the

appropriate number of series terms to include is important, and I plan to study this question

in future work.

Next, I plot the simulated distribution of the test statistic in different Monte Carlo set-

tings. Figure 5 plots the distribution of the t test statistic under the null for the three setups

I study. As we can see, in all three settings the simulated distribution of the trn test statistic10The series-based test cannot detect alternatives which are orthogonal to all series terms used to form the

test statistic. The more series terms are used to construct the test, the fewer such alternatives exist.

37

is very close to the standard normal. In contrast, the simulated distribution of the tkn test

statistic is off to the left as compared to the standard normal. This illustrates the importance

of the proper normalization, which explicitly accounts for the estimation variance, and helps

explain why the tests based on the normalization τn = kn are severely undersized.

6.2 Heteroskedastic Errors

In this subsection I investigate the performance of the test when the errors are het-

eroskedastic. The form of heteroskedasticity is εi ∼ i.n.i.d. N(0, 0.015 exp(Qi+Zi)). Figure 6

illustrates this form of heteroskedasticity. The test statistic is based on

ξHC = ε′P (P ′ΣP )−1P ′ε,

where Σ = diag(ε2i ), and is given by

tHC,τn =ξHC − τn√

2τn,

for τn = rn. I do not report the results for τn = kn, because they are similar to the previous

section, i.e. the test based on tHC,kn is severely undersized and low-powered. Instead of

comparing the test statistics tHC,rn and tHC,kn , I compare the feasible test statistic tHC,rn

with the infeasible test statistic

tHC,rn,inf =ξHC,inf − rn√

2rn,

where

ξHC,inf = ε′P (P ′ΣP )−1P ′ε,

where Σ = diag(E[ε2i |Qi, pi, Zi]) = diag(0.015 exp(Qi+Zi)). I make this comparison in order

to understand whether the behavior of the test under heteroskedastic errors is driven by

heteroskedasticity itself or by using the estimated variance-covariance matrix instead of the

38

true one. Because of higher computational burden associated with the heteroskedasticity

robust test statistic, I reduce the number of simulation draws from B = 2, 000 to B = 1, 000.

Table 3 shows the size and power of the nominal 5% level test for the three setups discussed

above and two test statistics, tHC,rn and tHC,rn,inf . Figure 7 plots the distribution of the t

test statistic under the null for these three setups.

As we can see, in the heteroskedastic case the feasible test is somewhat oversized, and the

size of the test based on the ξ test statistic is closer to the nominal level than the size of the

test based on the t test statistic. Interestingly, as we go from Setup 2 to Setup 3, i.e. increase

kn and rn while keeping n constant, the size of the test becomes closer to the nominal level,

but the simulated distribution of the t test statistics moves away from standard normal.

As for the comparison between the feasible and infeasible test statistics, even though

their simulated distributions look different, with the infeasible test statistic distribution being

closer to the standard normal, the simulated size of these two tests is very close. Similarly

to the feasible test, the infeasible test is oversized in all three setups. Hence, it appears that

it is heteroskedasticity itself, and not the variance-covariance matrix estimation, that causes

the size distortion. At the same time, the variance-covariance matrix estimation noticeably

affects the finite-sample distribution of the test statistic, but has very little effect on its tail

behavior and, as a result, on the finite-sample rejection probabilities.

6.3 Test Behavior under Local Alternatives

In this section, I study the behavior of the proposed test under local alternatives. In-

stead of fixing the DGP and moving the model, as in Sections 4.4 and 5.4, I fix the model

and move the DGP. As discussed in Hong and White (1995), these two approaches lead to

similar conclusions, and while the former simplifies asymptotic theory, the latter is easier to

implement in simulations.

More specifically, I use the same setup as before, but the true DGP changes with the

39

sample size n:

TCi = C1(Zi) + γ(Zi)pi + δ(Zi)Qi +(r1/4n /n1/2

)d0(pi, Qi)εi

d0(pi, Qi) = λ1|pi|1/2 + λ2|Qi|1/3

(λ1, λ2) = (0.75, 0.75)

In other words, the true model is nonparametric, but it approaches the semiparametric

varying coefficient model at the rate (r1/4n /n1/2) as the sample size grows. Based on the result

of Theorem 4, the rejection probability should remain the same as the sample size changes.

I gradually increase the sample size from n = 500 to n = 25, 000 and compute the

simulated rejection probabilities for the tests based on the trn test statistic. Table 4 shows

the size and power of the nominal 5% level test for the semiparametric null hypothesis as

the sample size varies. Figure 8 plots the distribution of the t test statistic under local

alternatives for n = 1, 000 and n = 10, 000.

As we can see, the rejection probabilities for the test based on the trn statistic lie between

21% and 27% for all sample sizes considered, and the simulated distributions of the trn

statistic for n = 1, 000 and n = 10, 000 look very similar. These findings are consistent with

the theoretical results established in Section 4.4.

6.4 Multiple Alternatives and Bonferroni Correction

If a researcher wants to estimate a model that can be nested in an expanding set of

alternatives (e.g. a parametric model can be nested in a semiparametric partially linear

model or in a nonparametric model), it is possible to test it against multiple alternatives

simultaneously while using the Bonferroni correction.

Namely, if the null model is fully parametric:

HP0 : P (E[Yi|Xi] = X ′1iβ1 +X ′2iβ2) = 1 for some β1 ∈ Rdx1 , β2 ∈ Rdx2 ,

40

where Xi = (X ′1i, X′2i)′, a researcher may consider a semiparametric varying coefficient and

a fully nonparametric alternatives simultaneously:

HV C1 : E[Yi|Xi] = X ′1iβ(X2i) for some β(·) : Rdx2 → Rdx1+1;

HNP1 : E[Yi|Xi] = g(Xi) for some g(·) : Rdx → R.

where X1i = (1, X ′1i)′.

Intuitively, using the former alternative may improve power of the test if the true model

turns out to be close to a varying coefficient one, while the latter ensures consistency against

a general nonparametric alternative. The test statistic should be modified accordingly, by

including in P kn(Xi) only those power series terms that are present under the alternative.

For example, alternative HV C1 does not allow higher powers of X1i to enter the model, so

they should be removed from P kn(Xi) when constructing the test statistic for HP0 against

HV C1 .

Because now several hypotheses tests are done simultaneously, the Bonferroni correction

is needed to control size. Namely, the nominal significance level for each individual test

should be α/2 (or α/T , if there are T tests) instead of α. The resulting overall test rejects

the null if at least one individual test rejects the null at the α/2 level.

Next, I simulate data from three data generating processes. The first DGP is fully para-

metric, while the latter two resemble the DGPs used before but make it more difficult to

distinguish between the parametric and semiparametric models.

1. Parametric model:

TCi = C1 + γpi + δQi + θZi + εi

(C1, γ, δ, θ) = (26, 0.5, 1.25,−2)

41

2. Semiparametric varying coefficient model:

TCi = C1(Zi) + γ(Zi)pi + δ(Zi)Qi + εi

C1(z) = 20− 3(z − 2) + 0.5(z − 2)2 − 0.025(z − 2)3

α(z) = 0.35 + 0.05 exp(2(z − 2)) + 0.05(z − 2)2

β(z) = 0.3 + 0.075 exp(2(z − 2)) + 0.025(z − 2)2

3. Nonparametric model:

TCi = C1(Zi) + γ(Zi)pi + δ(Zi)Qi + λ1|pi|1/2 + λ2|Qi|1/3 + εi

(λ1, λ2) = (0.65, 0.65)

Because in all previous simulations the normalization τn = rn performs better than τn =

kn, in this section I only consider the former normalization. Table 5 compares the size and

power of the nominal 5% level test against a general nonparametric alternative and of the

simultaneous Bonferroni-corrected test of the parametric null against two alternatives for two

sample sizes, n = 1, 000 with kn = 80 and n = 5, 000 with kn = 150.

As we can see, the Bonferroni procedure controls the size of the test fairly well and sig-

nificantly improves power when the true model is semiparametric almost without sacrificing

power when the true model is nonparametric. Thus, when several alternative models are

available, it may be beneficial to test the null model against different alternatives simultane-

ously to improve the power against particular alternatives while still controlling the size by

using the Bonferroni adjustment.

A possible direction for future research is to develop a step-up or step-down model selec-

tion procedure that would not only test a given model against one or several alternatives,

but also would allow the researcher to consider a different, more general, null model if the

original null is rejected.

42

7 Empirical Example

In this section, I apply the proposed test to the Canadian household gasoline consumption

data from Yatchew and No (2001). They estimate gasoline demand as a function of the

gasoline price and various demographics.

Yatchew and No (2001) use several demand models, including semiparametric specifica-

tions. They use differencing (see Yatchew (1997) and Yatchew (1998) for details) to estimate

semiparametric models. The relevance of semiparametric models in gasoline demand esti-

mation was first pointed out by Hausman and Newey (1995) and Schmalensee and Stoker

(1999). Yatchew and No (2001) follow these papers in using semiparametric specifications

and pay special attention to specification testing. However, they only test semiparametric

specifications against parametric ones, while I use series methods to estimate semiparamet-

ric specifications and implement the proposed series-based specification test to assess their

validity as compared to a general nonparametric model. As I will show, the semiparametric

models used by Yatchew and No (2001) are adequate and are not rejected when compared

to a nonparametric model; however, less flexible semiparametric or parametric models are

rejected. Thus, it appears that Yatchew and No (2001) succeed in choosing a flexible enough

yet parsimonious model.

The specifications I estimate parallel those used in Yatchew and No (2001). In particu-

lar, I estimate a partially linear model which is nonparametric both in PRICE and AGE

(corresponds to Model 3.1 in Yatchew and No (2001)):

y = f(PRICE,AGE) + z′β + ε, (7.1)

where y is the logarithm of the total distance traveled in the month, PRICE is the gasoline

price per liter, AGE is the age of the first driver of the car, and z includes the logarithm of

the household income, the logarithm of the number of drivers in the household, the logarithm

of the household size, monthly dummies, an urban dummy, and a dummy for singles under

43

35 years old.

Next, I estimate the additive semiparametric model (roughly corresponds to Model 3.3

in Yatchew and No (2001)):

y = f(PRICE) + g(AGE) + z′β + ε (7.2)

I also consider a model which is nonparametric in AGE but parametric in log(PRICE)

(roughly corresponds to Model 3.4 in Yatchew and No (2001)):

y = g(AGE) + γ log(PRICE) + z′β + ε (7.3)

Finally, I consider a fully parametric model:

y = α0 + α1 log(AGE) + γ log(PRICE) + z′β + ε (7.4)

In order to apply my test, I need to choose the series functions P kn(x) or, equivalently,

define a nonparametric alternative that can be approximated by these series functions. Ide-

ally, I would want to use a fully nonparametric alternative y = h(PRICE,AGE, z) + ε.

However, this is impractical in the current setting: the dataset from Yatchew and No (2001)

contains 12 monthly dummies, an urban dummy, and a dummy for singles under 35 years old.

The fully nonparametric alternative would require to completely saturate the model with the

dummies, i.e. interact all series terms in continuous regressors with a full set of dummies.

This would be equivalent to dividing the dataset into 12 ·2 ·2 = 48 bins and estimating within

every bin separately. Given that the total number of observations is 6,230, this would leave

me with about 125 observations per bin on average and would make semi- or nonparametric

estimation problematic.

I choose a different approach to deal with the dummies. I assume that the nonparamet-

ric alternative still satisfies separability between the dummies and the remaining variables.

44

Separating z into z1, which includes the logarithm of the household income, the logarithm

of the number of drivers in the household, the logarithm of the household size, and z2,

which includes the dummies, I rewrite all models with z1 in place of z and I consider the

nonparametric alternative given by

y = h(PRICE,AGE, z1) + z′2λ+ ε (7.5)

Before I move on to specification testing, I present results from specification 7.2 to com-

pare my results with the results in Yatchew and No (2001). I estimate the semiparametric

specification 7.2 using power series, with ln = 4 terms for both AGE and PRICE. Instead of

using the levels, I take the logarithms of AGE and PRICE and demean them to avoid deal-

ing with exploding values of the series terms. In other words, I use log(AGE), log(AGE)2,

log(AGE)3, log(AGE)4 and log(PRICE), log(PRICE)2, log(PRICE)3, log(PRICE)4,

where log(AGE) = log(AGE)−log(AGE) and log(PRICE) = log(PRICE)−log(PRICE).11

This, together with other regressors, yieldsmn = 25 parameters in the semiparametric model.

Figure 9 plots the estimated age and price effects from specification 7.2 and correspond

to Figure 2 in Yatchew and No (2001). As we can see, the relationship between PRICE

and gasoline demand is close to linear, with nonlinearities present only at the boundaries of

the support of PRICE, while the relationship between AGE and gasoline demand is highly

nonlinear. Thus, we could expect to reject specifications which are parametric and linear in

AGE.

Table 6 shows the estimates from specification 7.2 and compares them to estimates in

Figure 2 in Yatchew and No (2001). As we can see from these figures and tables, my results

closely mirror those in Yatchew and No (2001).

Now I turn to specification testing. To construct the regressors P kn used to evaluate the

test statistic, I use ln = 4 power series terms in AGE and PRICE, the set of dummies

discussed above, jn = 2 power series terms in the logarithms of INCOME, DRIV ERS,11Here, for any variable X, X =

∑iXi/n is the sample average of X.

45

and HHSIZE. I then use pairwise interactions (tensor products) of univariate power se-

ries, and add all possible three, four, and five element interactions between AGE, PRICE,

INCOME, DRIV ERS, and HHSIZE, without using higher powers in these interaction

terms to avoid multicollinearity. This gives rise to kn = 120 terms in the nonparametric

model.

I estimate specifications 7.1–7.4 using series method and then construct the proposed test

statistic. The resulting t-statistic equals 0.652 for specification 7.1, 0.165 for specification 7.2,

0.050 for specification 7.3, and 3.545 for specification 7.4. Table 7 collects these test results

together. Thus, only the parametric specification is rejected at the 5% significance level, while

all three semiparametric specifications under consideration are not rejected when compared

to a general nonparametric alternative.

This is in line with the results in Yatchew and No (2001): even though they do not test

their semiparametric specifications against a general nonparametric alternative, they find no

evidence against a specification similar to 7.3 when compared to specification 7.1.

In addition to the specifications considered above, I also consider a semiparametric spec-

ification which is nonparametric in price but not age:

y = f(PRICE) + γ log(AGE) + z′β + ε (7.6)

When testing it against specification 7.1, the t-statistic is 3.702, so that specification 7.6

is rejected at the 5% significance level. Thus, it seems that controlling for AGE flexibly is

crucial in this gasoline demand application.

Finally, it is worth noting that if I use the proposed test to compare specification 7.4

against a semiparametric alternative 7.1 (so that P kn includes only series terms in AGE

and PRICE and their interactions), I obtain the t-statistic of 19.505 (as opposed to 3.545

when the alternative is fully nonparametric), which means that the parametric model is

rejected at the 5% significance level. Thus, using a general nonparametric alternative instead

of a semiparametric alternative, which leads to the test being consistent against a general

46

alternative, deteriorates the power of the test if the true model is close to semiparametric.

For practical purposes, if the null model under consideration is nested in several more

general models, it may make sense to test it against these several alternatives simultaneously

using Bonferroni correction. Using a general nonparametric alternative will result in consis-

tency, while using more restrictive alternatives may result in better power properties if these

alternatives are close to being correct.

8 Conclusion

In this paper, I develop a new specification test for semiparametric conditional mean

models. The proposed test achieves consistency by turning a conditional moment restriction

into a growing number of unconditional moment restrictions using series methods. Because

the number of series terms grows with the sample size, the usual asymptotic theory for the

parametric Largange Multiplier test is no longer valid. I explicitly allow the number of terms

to grow and show that a normalized test statistic converges in distribution to the standard

normal. The proposed test has several attractive features compared to the existing tests.

First, the proposed test is simple to implement. The test statistic is based on a quadratic

form in the semiparametric regression residuals, so only estimation of the restricted model

is required to compute the test statistic. In the homoskedastic case, the quadratic form on

which the test is based can be computed as nR2 from the regression of the semiparametric

residuals on the series terms used to form the test. Moreover, the asymptotic distribution of

the test statistic is pivotal, which facilitates the calculation of appropriate critical values.

Second, when the null model is nested in the alternative and is estimated by series meth-

ods, the projection property of series estimators makes it possible to explicitly account for the

estimation variance and obtain refined asymptotic results. This refinement can be thought

of as a degrees of freedom correction. Simulations show that because of this adjustment the

proposed test behaves well in finite samples.

47

Third, series methods make it easy to restrict the class of alternatives from a fully non-

parametric to certain semiparametric classes, such as additive, varying coefficient, or partially

linear. Doing so will result in the loss of consistency against a general alternative but will

improve the power of the test in certain directions. In order to maintain consistency, one can

combine tests against various alternatives simultaneously, including the fully nonparametric

alternative, and use the Bonferroni correction to control the size of the test.

Finally, the test is not limited to models estimated by series methods. With a slightly

different normalization, it remains valid even when the restricted model is estimated using

other semiparametric methods, such as kernels or local polynomials. However, because with

these semiparametric estimators the degrees of freedom correction is not available, stronger

assumptions are required, and my simulations show that the test based on a generic normal-

ization is typically undersized and low powered.

I apply the proposed test to the Canadian household gasoline consumption data from

Yatchew and No (2001) and find no evidence against the semiparametric specifications used

in their paper. However, I show that my test does reject less flexible semiparametric or

parametric models.

There are several avenues for future research, such as developing a data-driven procedure

to choose tuning parameters (the number of series terms under the null and alternative),

extending the proposed test to semiparametric models with endogeneity in the parametric

part, and designing a model selection procedure (such as step-up or step-down procedures)

based on the proposed test.

48

Appendices

A Tables and Figures

Table 1: Correlation Between Regressorsln pKi ln pLi lnQi Zi

ln pKi 1ln pLi 0.337 1lnQi 0.260 0.232 1Zi 0.113 0.161 0.474 1

The table shows pairwise correlations between the regressors and is based on one realization of the regressorvalues.

Table 2: Rejection Probabilities, Normal ErrorsSP NP SP NP

ξ test statistic t test statisticτn = kn

Setup 1 0.003 0.022 0.003 0.024Setup 2 0.004 0.625 0.004 0.649Setup 3 0.003 0.495 0.004 0.512

τn = rnSetup 1 0.057 0.176 0.068 0.197Setup 2 0.053 0.918 0.063 0.927Setup 3 0.053 0.849 0.058 0.859

Setup 1: n = 1, 000, kn = 80, rn = 65. Setup 2: n = 5, 000, kn = 80, rn = 65. Setup 3: n = 5, 000, kn = 150,rn = 132. SP refers to semiparametric DGP, NP refers to nonparametric DGP. Entries in bold correspondto cases when H0 is true. Results are based on B = 2, 000 simulation draws.

49

Table 3: Rejection Probabilities, Normal Heteroskedastic ErrorsSP NP SP NP

ξ test statistic t test statisticFeasible Test Statistic tHC,rn

Setup 1 0.103 0.288 0.120 0.328Setup 2 0.076 0.965 0.083 0.975Setup 3 0.071 0.934 0.076 0.943

Infeasible Test Statistic tHC,rn,infSetup 1 0.095 0.303 0.106 0.325Setup 2 0.070 0.969 0.075 0.974Setup 3 0.071 0.939 0.076 0.954

Setup 1: n = 1, 000, kn = 80, rn = 65. Setup 2: n = 5, 000, kn = 80, rn = 65. Setup 3: n = 5, 000, kn = 150,rn = 132. SP refers to semiparametric DGP, NP refers to nonparametric DGP. Entries in bold correspondto cases when H0 is true. Results are based on B = 1, 000 simulation draws.

Table 4: Rejection Probabilities, Local Alternativesn kn rn P(reject H0)

500 80 65 0.2291000 80 65 0.2602500 96 78 0.2095000 150 132 0.26410000 175 154 0.27025000 200 176 0.254

Results are based on B = 2, 000 simulation draws.

Table 5: Rejection Probabilities, Normal Errors, BonferroniP SP NP P SP NPξ test statistic t test statistic

HP0 vs. HNP

1

Setup 1 0.049 0.101 0.223 0.056 0.117 0.245Setup 2 0.051 0.497 0.947 0.059 0.519 0.952

HP0 vs. HNP

1 and HV C1

Setup 1 0.048 0.239 0.213 0.071 0.310 0.273Setup 2 0.052 0.955 0.934 0.073 0.971 0.952

Setup 1: n = 1, 000, kn = 80, rn = 65. Setup 2: n = 5, 000, kn = 150, rn = 132. P refers to parametricDGP, SP refers to semiparametric DGP, NP refers to nonparametric DGP. Entries in bold correspond tocases when H0 is true. Results are based on B = 2, 000 simulation draws.

50

Table 6: Partially Linear Model EstimatesYatchew and No (2001) My Estimates

log income 0.2844 0.2852(0.0211) (0.0206)

log drivers 0.5420 0.5397(0.0345) (0.0338)

log hhsize 0.1101 0.1103(0.0284) (0.0276)

single, age < 35 0.2054 0.1870(0.0622) (0.0636)

urban dummy -0.3339 -0.3309(0.0202) (0.0198)

monthly effects Yes Yesaverage price elasticity -0.890 -0.937R2 0.2689 0.2648s2 0.4967 0.5014number of observations 6,230 6,230

The table compares the estimates from Figure 2 in Yatchew and No (2001) and the semiparametric estimatesI obtain using specification 7.2. I use ln = 4 power series terms in both PRICE and AGE, which yieldsmn = 25 parameters in the semiparametric model. Standard errors are shown in parentheses.

Table 7: Specification Tests for Yatchew and No (2001)Model t-statistic Reject H0?7.1 0.652 No7.2 0.165 No7.3 0.050 No7.4 3.545 Yes7.6 3.702 Yes

The table shows the values of the t test statistic for specifications 7.1–7.6. The critical value is based onN(0, 1) distribution and equals 1.645 at the 5% significance level.

51

Figure 1: Varying Coefficient Functions

This figure shows the true coefficient functions C1(z), γ(z), and δ(z) for the varying coefficient semiparametric

model from equation 6.1 used in simulations.

Figure 2: Comparison of RTS Estimates

This long dashed line shows the true function δ(z) for equation 6.1. The short dashed line shows its OLS

estimate, the dash-dotted line shows its OLS with interactions estimate, and the solid line shows its semi-

parametric varying coefficient estimate. The figure is based on B = 1, 000 simulation draws with n = 1, 000

observations in each. The regressors are fixed across the simulation draws, only the errors are redrawn as

εi ∼ i.i.d. N(0, 2.25).

52

Figure 3: H0 in 3D

The left figure shows the dependence of TCi (on z axis) on Pi (on y axis) Qi (on x axis) under H0 in

equation 6.1. The right figure shows the dependence of TCi (on z axis) on Pi (on y axis) Qi (on x axis)

under H1 in equation 6.2. The figure is based on one realization of the regressor values and errors.

Figure 4: H0 and H1 in 2D

The left figure shows the dependence of TCi on pi conditional on fixed Qi and Zi. The right figure shows the

dependence of TCi on Qi conditional on fixed pi and Zi. The solid lines show the linear relationship which

holds under H0 in equation 6.1. The dashed lines show the nonlinear relationship which holds under H1 in

equation 6.2. The figure is based on one realization of the regressor values and errors.

53

Figure 5: Distribution of t under H0, Normal Errors

The solid line shows the simulated distribution of the trn test statistic, the dash-dotted line shows the

simulated distribution of the tkntest statistic, and the dashed line shows the standard normal distribution.

The results are based on B = 2, 000 simulation draws, εi ∼ i.i.d. N(0, 2.25). In the upper left figure

n = 1, 000, kn = 80, rn = 65; in the upper right figure n = 5, 000, kn = 80, rn = 65; in the bottom figure

n = 5, 000, kn = 150, rn = 132.

Figure 6: Form of Heteroskedastic Errors

The figure illustrates the form of heteroskedasticity εi ∼ i.i.d. N(0, 0.015 exp(Qi + Zi)). The coordinates of

the points in the scatter plot are given by (lnQi +Zi, εi). It is based on one realization of the regressor values

and errors.

54

Figure 7: Distribution of t under H0, Normal Heteroskedastic Errors

The solid line shows the simulated distribution of the feasible tHC,rn test statistic, the dash-dotted line shows

the simulated distribution of the infeasible tHC,rn,inf test statistic, and the dashed line shows the standard

normal distribution. The results are based on B = 1, 000 simulation draws, εi ∼ i.n.i.d. N(0, 0.015 exp(Qi +

Zi)). In the upper left figure n = 1, 000, kn = 80, rn = 65; in the upper right figure n = 5, 000, kn = 80,

rn = 65; in the bottom figure n = 5, 000, kn = 150, rn = 132.

55

Figure 8: Distribution of t under Local Alternatives

The solid line shows the simulated distribution of the trn test statistic under local alternatives for n = 10, 000,

kn = 175, rn = 154, the dash-dotted line shows the simulated distribution of the trn test statistic under local

alternatives for n = 1, 000, kn = 80, rn = 65, and the dashed line shows the standard normal distribution.

The results are based on B = 2, 000 simulation draws, εi ∼ i.i.d. N(0, 2.25).

Figure 9: Age and Price Effecst

The figure shows the estimated nonparametric functions f(PRICE) and g(AGE) for specification 7.2.

56

B Some Practical Tips

B.1 Series Representation

In order to derive the series-based specification test, I write the restricted and unrestricted

models in a series form. For a variable z, let Qln(z) = (q1(z), ..., qln(z))′ be a sequence of

approximating functions of z. Then an unknown function h(z) can be approximated as

h(z) ≈ln∑j=1

γjqj(z) = Qln(z)′γ

Let Wmn(x) be the sequence of functions which is used to estimate the restricted semi-

parametric model. Namely, for the partially linear model, mn = dx1 + ln and Wmn(x) =

(x′1, Qln(x2)

′)′. Then the semiparametric partially linear model can be written as

Yi = X ′1iα + g(X2i) + εi = X ′1iα +Qln(X2i)′γ +Ri + εi = Wmn(Xi)

′β1 + ei,

where β1 = (α′, γ′)′, Ri = g(X2i)−Qln(X2i)′γ is the approximation error, and ei = εi +Ri.

Next, let T rn(x) = (t1(x), ..., trn(x, ))′ be the sequence of approximating functions which

is used in addition to Wmn(x) to estimate the fully nonparametric model, so that P kn(x) =

(Wmn(x)′, T rn(x)′)′. Namely, for the partially linear model, T rn(x) may include powers of x1

and interactions between x1 and x2. The difference between T rn(x) and Wmn(x) is that the

former is present only in the unrestricted nonparametric model, while the latter is present in

both restricted and unrestricted models.

The unrestricted nonparametric model can be written as

Yi = P kn(Xi)′β +Ri + εi = Wmn(Xi)

′β1 + T rn(Xi)′β2 +Ri + εi,

where β = (β′1, β′2)′.

The null hypothesis that the conditional mean function is semiparametric corresponds

57

to rn restrictions β2 = 0. To test this hypothesis, the researcher first needs to estimate the

semiparametric model

Yi = Wmn(Xi)′β1 + ei,

obtain the estimates β1, compute the residuals εi = Yi − Wmn(Xi)′β1, and then use the

following statistic as a basis for the test:

ξ = ε′P (σ2P ′P )−1P ′ε,

where σ2 = ε′ε/n.

As I show below, this test statistic can be derived as a Largange Multiplier or Condi-

tional Moment test statistic for the semiparametric model written in the series form if the

dependence of the number of terms on the sample size is ignored, and the model is treated

as parametric. In parametric models with a fixed number of restrictions r, this test statistic

converges in distribution χ2r under the null. In the present paper, the number of restrictions

rn grows with the sample size, so the usual asymptotic result does not hold. I develop an

asymptotic theory for the proposed test in Section 4.

B.2 Proposed Test as LM Test

As shown above, the unrestricted nonparametric model is given by

Yi = P kn(Xi)′β + ei = Wmn(Xi)

′β1 + T rn(Xi)′β2 + ei,

where β = (β′1, β′2)′. The semiparametric null model imposes the restriction that β2 = 0. If

one ignored the presence of approximation errors and the dependence of the number of series

terms on the sample size, this restriction could be tested using the Lagrange Multiplier test.

58

The (quasi-)log-likelihood for the nonparametric model is given by

Ln(β, σ2) = −1

2log 2π − 1

2log σ2 − 1

2σ2n(Y − Pβ)′(Y − Pβ)

Then the score equals

Sn(β, σ2) =

∂Ln(β,σ2)∂β

∂Ln(β,σ2)∂σ2

=

1σ2n

P ′(Y − Pβ)

− 12σ2 + 1

2σ4n(Y − Pβ)′(Y − Pβ)

,

which under the null hypothesis becomes

Sn(β, σ2) =

1σ2n

P ′ε

0

,

where β = (β′1,0′rn)′.

The information matrix evaluated at true parameter values is:

Fn(β, σ2) =

(E[−∂Sn

∂β′ ] E[−∂Sn

∂σ2 ]

)=

E[ 1σ2n

P ′P ] 0

0 12σ4

Assuming that the information matrix equality holds, the ξ test statistic for the restricted

null model is constructed by evaluating the score and the Hessian at the restricted estimates:

ξ = nSn(β, σ2)′Fn(β, σ2)−1Sn(β, σ2) = ε′P (σ2P ′P )−1P ′ε,

B.3 Proposed Test as CM/J Test

As shown above, the semiparametric model is given by

Yi = Wmn(Xi)′β1 +Ri + εi, E[εi|Xi] = 0

59

Ignoring the presence of the approximation error, rewrite it as

Yi = Wmn(Xi)′β1 + ei, E[ei|Xi] = 0

It has mn parameters but should satisfy kn = mn + rn moment conditions, because ei

should be uncorrelated with any function of Xi, not only with Wmn(Xi):

E[P kn(Xi)ei] = 0 (B.1)

Because the model satisfies more unconditional moment restrictions than there are pa-

rameters to estimate, it is overidentified. Thus, the specification test can be based on testing

the increasing number of moment conditions.

The semiparametric model satisfies the following population moment condition

E[Wmn(Xi)ei] = 0 (B.2)

It means that the residuals εi solve the sample analog of the population moment condi-

tion B.2:1

n

n∑i=1

Wmn(Xi)εi =1

nW ′ε = 0

If the semiparametric conditional model model is correctly specified, we would expect

that the sample analog of the moment condition B.1 to be close to zero, i.e.

1

n

n∑i=1

P kn(Xi)εi =1

nP ′ε ≈ 0

The Conditional Moment test statistic could be used to evaluate whether these moment

conditions are statistically different from zero:

ξ = ε′P (σ2P ′P )−1P ′ε

60

Remark A.1 (Conditional Moments and Overidentifying Restrictions Tests). The proposed

test statistic is very similar, though not exactly the same, as the overidentifying restrictions

test statistic. Suppose that instead of estimating the parameters by solving the sample analog

of the OLS population moment conditions:

E[Wmn(Xi)ei] = E[Wmn(Xi)(Yi −Wmn(Xi)′β1)] = 0,

the researcher estimates the parameters using GMM based on the overidentified population

moment condition:

E[P kn(Xi)ei] = E[P kn(Xi)(Yi −Wmn(Xi)′β1)] = 0

For a weighting matrix M and its first step estimate M , the GMM estimates β1 solve

minβ1

((Y −Wβ1)′P/n) M (P ′(Y −Wβ1)/n)

For the GMM residuals εi = Yi −Wmn(Xi)′β1, the overidentifying test statistic is

J = n(ε′P/n)M(P ′ε/n),

In general, β1 6= β1, because they solve different minimization problems, and hence ε 6= ε.

Under the null, however, β1 and β1, and thus ε and ε, should be close to one another, so it

should be possible to use both ξ and J as a basis for the specification test. The expression for

J may look more familiar from GMM literature; however, I prefer to use ξ because it can also

be viewed as an LM test, because it requires estimating only the restricted semiparametric

model, and because it makes it possible to directly account for the estimation variance.

If the researcher estimates 2SLS (i.e. GMMwithM = (σ2E[PiP′i ])−1 and M = (σ2P ′P/n)−1),

then two statistics coincide as long as the second step estimate of σ2 is used to form the J

61

statistic. Namely, 2SLS estimates solve

minβ1

((Y −Wβ1)′P/n) (σ2P ′P/n)−1 (P ′(Y −Wβ1)/n)

The first order condition is

(W ′P/n)(σ2P ′P/n)−1(P ′(Y −Wβ1)/n) = 0,

which yields

β1 = (W ′P (P ′P )−1P ′W )−1W ′P (P ′P )−1P ′Y = (W ′W )−1W ′Y = β1

Thus, the 2SLS and OLS estimates of β1 coincide, hence, ε = ε and σ2 = σ2, where σ2 is

the updated second-step estimate. The J statistic becomes

J = n(ε′P/n)(σ2P ′P/n)−1(P ′ε/n) = ε′P (σ2P ′P )−1P ′ε = ξ

B.4 Implementing Test in Practice

If the researcher is willing to nest the null model in the alternative and use series methods

to estimate the null model, then the following steps are needed to implement the test:

1. Pick the sequence of approximating functions of x, Wmn(x) = (w1(x), ..., wmn(x))′,

which will be used to estimate the semiparametric model.

2. Estimate the semiparametric model Yi = Wmn(Xi)′β1+εi using series methods. Obtain

the estimates β1 = (W ′W )−1W ′Y and residuals εi = Yi −Wmn(Xi)β1.

3. Pick the sequence of approximating functions of x, T rn(x) = (t1(x), ..., trn(x))′, which

will complement Wmn(x) to form the matrix P kn(x), kn = mn + rn, which corresponds

to a general nonparametric model. P kn(x) should be able to approximate any unknown

62

function sufficiently well. Common choices of basis functions include power series (see

Equation 3.3) and splines (see Equation 3.4).

4. Compute the quadratic form ξ = ε′P (σ2P ′P )−1P ′ε.

Note that in the homoskedastic case ξ can be computed as nR2 from the regression of

ε on P . The residuals from the regression of ε on P are given by e = ε−P (P ′P )−1P ′ε,

so that

e′e = ε′ε− εP (P ′P )−1P ′ε

Then

nR2 = n

(1− e′e

ε′ε

)= n

εP (P ′P )−1P ′ε

ε′ε= εP (σ2P ′P )−1P ′ε

Note also that, as shown in Remark A.1, ξ can be computed as the overidentifying

restrictions test statistic from the 2SLS instrumental variables regression of Yi on

Wmn(Xi) with (Wmn(Xi)′, T rn(Xi))

′ as instruments.

5. Compute the test statistic which is asymptotically standard normal under the null:

t =ξ − rn√

2rn

a∼ N(0, 1)

Reject the null if t > z1−α, the (1− α)-quantile of the standard normal distribution.

Alternatively, use the χ2 approximation directly: ξ a∼ χ2rn , reject the null if ξ > χ2

rn(1−

α), the (1− α)-quantile of the χ2 distribution with rn degrees of freedom.

If the researcher is willing to use other semiparametric methods, such as kernels, to

estimate the null model, then the following steps are needed to implement the test:

1. Estimate the semiparametric model Yi = f(Xi, θ, h) + εi using the preferred estimation

method. Obtain the estimates θ and h and residuals εi = Yi − f(Xi, θ, h).

63

2. Pick the sequence of approximating functions of x, P kn(x) = (p1(x), ..., pkn(x))′, which

is implicitly used to approximate a general nonparametric model. As in the previous

case, P kn(x) should be able to approximate any unknown function sufficiently well.

Common choices of basis functions include power series (see Equation 3.3) and splines

(see Equation 3.4).

3. Compute the quadratic form ξ = ε′P (σ2P ′P )−1P ′ε.

As before, in the homoskedastic case ξ can be computed as nR2 from the regression of

ε on P .

4. Compute the test statistic which is asymptotically standard normal under the null:

t =ξ − kn√

2kn

a∼ N(0, 1)

Reject the null if t > z1−α, the (1− α)-quantile of the standard normal distribution.

Alternatively, use the χ2 approximation directly: ξ a∼ χ2kn, reject the null if ξ > χ2

kn(1−

α), the (1− α)-quantile of the χ2 distribution with kn degrees of freedom.

If the researcher suspects that the errors are heteroskedastic, then ξ = ε′P (σ2P ′P )−1P ′ε

is replaced with ξHC = ε′P (P ′ΣP )−1P ′ε, where Σ = diag(ε2i ). All other steps remain un-

changed.

64

C Proofs

C.1 Proof of Theorem 1

Given the projection nature of series estimators, the test statistic becomes

ξ − rn√2rn

=ε′P (σ2P ′P )−1P ′ε− rn√

2rn=

(ε+R)′MWP (σ2P ′P )−1P ′MW (ε+R)− rn√2rn

Note that P = (W T ), so that MWP = (0n×mn MWT ). Then by the blockwise matrix

inverse formula,

MWP (σ2P ′P )−1P ′MW = MWT (σ2T ′MWT )−1T ′MW

Thus,

ξ − rn√2rn

=(ε+R)′MWT (σ2T ′MWT )−1T ′MW (ε+R)− rn√

2rn=

(ε+R)′T (σ2T ′T )−1T ′(ε+R)− rn√2rn

,

where T = MWT . The remainder of the proof consists of several steps.

Step 1. Show that ||T ′T /n− T ′T/n|| = op(1/√rn).

Note that

T ′T /n = T ′T/n− (T ′W/n)(W ′W/n)−1(W ′T/n)

Thus,

||T ′T /n− T ′T/n|| = ||(T ′W/n)(W ′W/n)−1(W ′T/n)||

By Lemma 15.2 in Li and Racine (2007), E[||P ′P/n − Ikn||2] = Op(ζ(kn)2kn/n) and

||P ′P/n− Ikn|| = Op(ζ(kn)√kn/n). Note that

P ′P/n− Ikn =

W ′W/n W ′T/n

T ′W/n T ′T/n

− Imn 0mn×rn

0rn×mn Irn

Hence, ||W ′W/n − Imn|| = Op(ζ(kn)

√kn/n) and ||W ′T/n|| = Op(ζ(kn)

√kn/n). Thus,

65

the eigenvalues of W ′W/n are bounded below and above w.p.a.1, and

∣∣∣∣∣∣(T ′W/n)(W ′W/n)−1(W ′T/n)∣∣∣∣∣∣ ≤ C

∣∣∣∣∣∣(T ′W/n)(W ′T/n)∣∣∣∣∣∣

≤ C∣∣∣∣∣∣(T ′W/n)

∣∣∣∣∣∣ ∣∣∣∣∣∣(W ′T/n)∣∣∣∣∣∣ = Op(ζ(kn)2kn/n),

where the last inequality is due to the fact that ||AB||2 ≤ ||A||2||B||2 (see, e.g., Trefethen

and Bau III (1997), p. 23).

As long as condition 4.1 holds, ζ(kn)2kn√rn/n→ 0, and ||T ′T /n−T ′T/n|| = op(1/

√rn).

This also implies that the smallest and largest eigenvalues of T ′T /n converge to one.

Step 2. Decompose the test statistic and bound the remainder terms.

(ε+R)′T (σ2T ′T )−1T ′(ε+R) = ε′T (σ2T ′T )−1T ′ε+ 2R′T (σ2T ′T )−1T ′ε+R′T (σ2T ′T )−1T ′R

By the projection inequality and Assumption 4,

R′T (σ2T ′T )−1T ′R ≤ R′R/σ2 = Op(nm−2αn )

Because T ′T /n and T T ′/n have the same nonzero eigenvalues and all eigenvalues of T ′T /n

converge to one, λmax(T T′/n) converges in probability to 1. Thus,

∣∣∣R′T (σ2T ′T )−1T ′ε∣∣∣ ≤∣∣∣Cλmax(T T

′/n)R′ε∣∣∣ ≤ ∣∣∣CR′ε∣∣∣ =

∣∣∣C∑i

Riεi

∣∣∣Note that

E

(∑i

Riεi

)2 = E

[∑i

∑j

εiεjRiRj

]= E

[∑i

R2i ε

2i

]

= nE[R2i ε

2i ] = nσ2E[R2

i ] ≤ nσ2 supx∈X

R(x)2 = O(nm−2αn )

by Assumption 4.

Hence, ∣∣∣R′T (σ2T ′T )−1T ′ε∣∣∣ ≤ ∣∣∣C∑

i

Riεi

∣∣∣ = Op(n1/2m−αn )

66

Thus,

(ε+R)′T (σ2T ′T )−1T ′(ε+R) = ε′T (σ2T ′T )−1T ′ε+Op(nm−2αn ) +Op(n

1/2m−αn )

Next,

ε′T (σ2T ′T )−1T ′ε = ε′T (σ2T ′T )−1T ′ε− 2ε′PWT (σ2T ′T )−1T ′ε+ ε′PWT (σ2T ′T )−1T ′PW ε

Note that

E[(n−1ε′T )Ω−1(n−1T ′ε)] = E[ε2iT′iΩ−1Ti]/n = E[tr(Ω−1ε2iTiT

′i )] = tr(Irn)/n = rn/n

Thus, by Markov’s inequality,

||Ω−1(n−1T ′ε)|| ≤ C√

(n−1ε′T )Ω−1(n−1T ′ε) = Op(√rn/n)

Because the eigenvalues of Ω are bounded below and above w.p.a. 1, it is also true that

||n−1T ′ε|| = Op(√rn/n) and ||n−1W ′ε|| = Op(

√mn/n). Using this result and the inequality

||AB||2 ≤ ||A||2||B||2, get

∣∣∣∣∣∣ε′PWT (σ2T ′T )−1T ′ε∣∣∣∣∣∣ =

∣∣∣∣∣∣ε′W (W ′W )−1W ′T (σ2T ′T )−1T ′ε∣∣∣∣∣∣

=∣∣∣∣∣∣n(ε′W/n)(W ′W/n)−1(W ′T/n)(σ2T ′T /n)−1(T ′ε/n)

∣∣∣∣∣∣≤ Cn

∣∣∣∣∣∣(ε′W/n)(W ′T/n)(T ′ε/n)∣∣∣∣∣∣ ≤ Cn

∣∣∣∣∣∣(ε′W/n)∣∣∣∣∣∣ ∣∣∣∣∣∣(W ′T/n)

∣∣∣∣∣∣ ∣∣∣∣∣∣(T ′ε/n)∣∣∣∣∣∣

= nOp(√mn/n)Op(ζ(kn)

√kn/n)Op(

√rn/n) = Op(ζ(kn)

√mnknrn/n)

In turn,

∣∣∣∣∣∣ε′PWT (σ2T ′T )−1T ′PW ε∣∣∣∣∣∣ =

∣∣∣∣∣∣ε′W (W ′W )−1W ′T (σ2T ′T )−1T ′W (W ′W )−1W ′ε∣∣∣∣∣∣

=∣∣∣∣∣∣n(ε′W/n)(W ′W/n)−1(W ′T/n)(σ2T ′T /n)−1(T ′W/n)(W ′W/n)−1(W ′ε/n)

∣∣∣∣∣∣≤ Cn

∣∣∣∣∣∣(ε′W/n)(W ′T/n)(T ′W/n)(W ′ε/n)∣∣∣∣∣∣

≤ Cn∣∣∣∣∣∣(ε′W/n)

∣∣∣∣∣∣ ∣∣∣∣∣∣(W ′T/n)∣∣∣∣∣∣ ∣∣∣∣∣∣(T ′W/n)

∣∣∣∣∣∣ ∣∣∣∣∣∣(W ′ε/n)∣∣∣∣∣∣


√kn/n)Op(ζ(kn)

√kn/n)Op(

√mn/n) = Op(ζ(kn)2mnkn/n)

67

Thus, under conditions 4.3 and 4.4,

ε′P (σ2P ′P )−1P ′ε = ε′T (σ2T ′T )−1T ′ε+Op(nm−2αn ) +Op(n

1/2m−αn )

+Op(ζ(kn)2mnkn/n) +Op(ζ(kn)√mnknrn/n) = ε′T (σ2T ′T )−1T ′ε+ op(

√rn)

(C.1)

Step 3. Deal with the leading term.

Define Ω = σ2E[TiT′i ] = σ2Irn , where Ti = T rn(Xi).

Lemma A.1. (Donald et al. (2003), Lemma 6.2) If E[Tiεi] = 0, E[(εiTiΩ−1T ′iεi)

2]/(rn√n)→

0, and rn →∞, then

n(n−1ε′T )Ω−1(n−1T ′ε)− rn√2rn

d→ N(0, 1) (C.2)

All three conditions of this lemma hold. E[Tiεi] = 0 and rn →∞ hold trivially, while

E[(εiTiΩ−1T ′iεi)

2] ≤ CE[ε4i ||Ti||4] ≤ CE[||Ti||4] ≤ Cζ(rn)2rn

Under condition 4.5, the assumptions of Lemma A.1 hold.

Lemma A.2. Suppose that Assumptions 2, 3, and 4 hold. Moreover, σ2(x) = σ2 for all

x ∈ X . Then

n(n−1ε′T )(σ2n−1T ′T )−1(n−1T ′ε)− n(n−1ε′T )Ω−1(n−1T ′ε)√rn

p→ 0 (C.3)

Proof of Lemma A.2. To prove the lemma, I will need an auxiliary result given below.

Lemma A.3. Let Ω = σ2T ′T /n, Ω = σ2T ′T/n Ω = σ2T ′T/n, Ω = σ2E[TiT′i ]. Suppose that

Assumptions 2(ii), 3, and 4 are satisfied. Then

||Ω− Ω|| = Op(ζ(kn)2kn/n)

||Ω− Ω|| = Op(rn/n1/2)

||Ω− Ω|| = Op(ζ(rn)r1/2n /n1/2)

If Assumption 2(i) is also satisfied then 1/C ≤ λmin(Ω) ≤ λmax(Ω) ≤ C, and if ζ(kn)2kn/n→0, ζ(rn)r

1/2n /n1/2 → 0, and rn(mn/n + m−2αn )1/2 → 0, then w.p.a. 1, 1/C ≤ λmin(Ω) ≤

λmax(Ω) ≤ C and 1/C ≤ λmin(Ω) ≤ λmax(Ω) ≤ C.

68

Proof of Lemma A.3 is presented after the remainder of the main proof.

Given the result of Lemma A.3,∣∣∣∣∣n(n−1ε′T )Ω−1(n−1T ′ε)√2rn

− n(n−1ε′T )Ω−1(n−1T ′ε)√2rn

∣∣∣∣∣=

∣∣∣∣∣n(n−1ε′T )(Ω−1 − Ω−1)(n−1T ′ε)√2rn

∣∣∣∣∣ =

∣∣∣∣∣n(n−1ε′T )(Ω−1 − Ω−1)(n−1T ′ε)√2rn

∣∣∣∣∣=

∣∣∣∣∣n(n−1ε′T )(Ω−1(Ω− Ω)Ω−1(Ω− Ω)Ω−1)(n−1T ′ε)√2rn

− n(n−1ε′T )(Ω−1(Ω− Ω)Ω−1)(n−1T ′ε)√2rn

∣∣∣∣∣≤∣∣n(n−1ε′T )(Ω−1(Ω− Ω)Ω−1(Ω− Ω)Ω−1)(n−1T ′ε)

∣∣√

2rn+

∣∣n(n−1ε′T )(Ω−1(Ω− Ω)Ω−1)(n−1T ′ε)∣∣

√2rn

≤ n||Ω−1n−1T ′ε||2(||Ω− Ω||+ C||Ω− Ω||2)√2rn

As has been shown above, ||Ω−1(n−1T ′ε)|| = Op(√rn/n). Hence,

n||Ω−1n−1T ′ε||2(||Ω− Ω||+ C||Ω− Ω||2)√2rn

=nOp(rn/n)op(1/

√rn)√

2rn=op(√rn)√

2rn= op(1),

provided that ||Ω− Ω|| = op(1/√rn), which holds under rate conditions 4.1–4.2.

The result of Theorem 1 now follows from equations C.1, C.2, and C.3.

Proof of Lemma A.3. It has been shown in the proof of Theorem 1 that ||T ′T /n−T ′T/n|| =Op(ζ(kn)2kn/n). As long as σ2 p→ σ2, this implies ||Ω− Ω|| = Op(ζ(kn)2kn/n).

Next, due to homoskedasticity,

||Ω− Ω|| = ||(σ2 − σ2)∑i

TiT′i/n|| ≤ |σ2 − σ2|

∑i

||Ti||2/n

= |n−1∑i

(ε2i − σ2) + 2n−1∑i

εi(gi − gi) + n−1∑i

(gi − gi)2|∑i

||Ti||2/n

First, by Chebyshev’s inequality, n−1∑

i (ε2i − σ2) = Op(n

−1/2).

Second, by Assumption 2, n−1∑

i (gi − gi)2 = Op(mn/n+m−2αn ).

Finally, note that

n−1∑i

εi(gi − gi) = n−1∑i

εi(W′i (β1 − β1) +Ri) = n−1(β1 − β1)′W ′ε+ n−1R′ε

69

Using the result proved above, n−1R′ε = Op(n−1/2m−αn ).

Next,

∣∣∣n−1(β1 − β1)′W ′ε∣∣∣ =

∣∣∣∣∣∣n−1(β1 − β1)′W ′ε∣∣∣∣∣∣ ≤ ||β1 − β1||∣∣∣∣∣∣n−1W ′ε

∣∣∣∣∣∣= Op

((mn/n+m−2αn )1/2

)Op(n

−1/2m1/2n ) = Op

(m1/2n (mn/n+m−2αn )1/2/n1/2

)Then

n−1∑i

εi(gi − gi) = Op(n−1/2m−αn ) +Op

(m1/2n (mn/n+m−2αn )1/2/n1/2

)= Op

(m1/2n (mn/n+m−2αn )1/2/n1/2

)Combining the results,

σ2 − σ2 = Op(n−1/2) +Op(mn/n+m−2αn ) +Op

(m1/2n (mn/n+m−2αn )1/2/n1/2

)= Op(n

−1/2),

because m1/2n (mn/n+m−2αn )1/2/n1/2 = o(n−1/2) and mn/n+m−2αn = o(n−1/2).

By Lemma 1, E[||Ti||2] ≤ rn, which yields ||Ω− Ω|| = Op

(rnn

−1/2).Next, ||Ω− Ω|| = ||σ2(T ′T/n− E[TiT

′i ])||.

Thus,

E[||Ω− Ω||2] = E[||σ2∑i

(TiT′i − E[TiT

′i ])/n||2]

= σ4E[||TiT ′i − E[TiT′i ]||2]/n ≤ CE[||Ti||4]/n = Cζ(rn)2rn/n,

and hence ||Ω− Ω|| = Op(ζ(rn)√rn/n).

The remaining conclusions follow from Lemma A.6 in Donald et al. (2003).


Given the projection nature of the series estimators, the test statistic becomes

ξ − rn√2rn

=ε′P (P ′ΣP )−1P ′ε− rn√

2rn=

(ε+R)′MWP (P ′ΣP )−1P ′MW (ε+R)− rn√2rn

Next, note that P = (W T ), so that MWP = (0n×mn MWT ). Then by the blockwise

70

matrix inverse formula,

MWP (P ′ΣP )−1P ′MW = MWT (nΩ)−1T ′MW ,

where

Ω = T ′ΣT/n− (T ′ΣW/n)(W ′ΣW/n)−1(W ′ΣT/n)

Thus,

ξ − rn√2rn

=(ε+R)′MWT (nΩ)−1T ′MW (ε+R)− rn√

2rn=

(ε+R)′T (nΩ)−1T ′(ε+R)− rn√2rn

,


Step 1. Assume that ||Ω − Ω|| = op(1/√rn), where Ω = T ′ΣT . This is not a primitive

assumption; however, it is difficult to derive more primitive sufficient conditions.

Step 2. Use the following auxiliary result:

Lemma A.4. Let Ω =∑

i ε2iTiT

′i/n, Ω =

∑i ε

2iTiT

′i/n, Ω =

∑i σ

2i TiT

′i/n, Ω = E[ε2iTiT

′i ],

where σ2i = E[ε2i |Xi]. Suppose that Assumptions 2(ii), 3, and 4 are satisfied. Then

||Ω− Ω|| = Op

(ζ(rn)2(mn/n+m−2αn )

)||Ω− Ω|| = Op(ζ(rn)r1/2n /n1/2)

||Ω− Ω|| = Op(ζ(rn)r1/2n /n1/2)

If Assumption 2(i) is also satisfied then 1/C ≤ λmin(Ω) ≤ λmax(Ω) ≤ C, and if ζ(rn)r1/2n /n1/2 →

0 and ζ(rn)2(mn/n + m−2αn )1/2, then w.p.a. 1, 1/C ≤ λmin(Ω) ≤ λmax(Ω) ≤ C and 1/C ≤λmin(Ω) ≤ λmax(Ω) ≤ C.

Moreover, if ||Ω− Ω|| = op(1), then 1/C ≤ λmin(Ω) ≤ λmax(Ω) ≤ C.


(ε+R)′T (nΩ)−1T ′(ε+R) = ε′T (nΩ)−1T ′ε+ 2R′T (nΩ)−1T ′ε+R′T (nΩ)−1T ′R

Because T ′T /n and T T ′/n have the same nonzero eigenvalues and all eigenvalues of T ′T /n

converge to one, λmax(T T′/n) converges in probability to 1. Moreover, the eigenvalues of Ω

71

are bounded below and above. Thus, by Assumption 4,

R′T (nΩ)−1T ′R ≤ CR′(n−1T T ′)R/σ2 ≤ CR′R/σ2 = Op(nm−2αn )

Next, ∣∣∣R′T (nΩ)−1T ′ε∣∣∣ ≤ ∣∣∣Cλmax(T T

′/n)R′ε∣∣∣ ≤ ∣∣∣CR′ε∣∣∣ = Op(n

1/2m−αn )

In turn,

ε′T (nΩ)−1T ′ε = ε′T (nΩ)−1T ′ε− 2ε′PWT (nΩ)−1T ′ε+ ε′PWT (nΩ)−1T ′PW ε

Next,

∣∣∣∣∣∣ε′PWT (nΩ)−1T ′ε∣∣∣∣∣∣ =

∣∣∣∣∣∣ε′W (W ′W )−1W ′T (nΩ)−1T ′ε∣∣∣∣∣∣

=∣∣∣∣∣∣n(ε′W/n)(W ′W/n)−1(W ′T/n)Ω−1(T ′ε/n)

∣∣∣∣∣∣≤ Cn

∣∣∣∣∣∣(ε′W/n)(W ′T/n)(T ′ε/n)∣∣∣∣∣∣ ≤ Cn

∣∣∣∣∣∣(ε′W/n)∣∣∣∣∣∣ ∣∣∣∣∣∣(W ′T/n)

∣∣∣∣∣∣ ∣∣∣∣∣∣(T ′ε/n)∣∣∣∣∣∣


√kn/n)Op(

√rn/n) = Op(ζ(kn)

√mnknrn/n)

In turn,

∣∣∣∣∣∣ε′PWT (nΩ)−1T ′PW ε∣∣∣∣∣∣ =

∣∣∣∣∣∣ε′W (W ′W )−1W ′T (nΩ)−1T ′W (W ′W )−1W ′ε∣∣∣∣∣∣

=∣∣∣∣∣∣n(ε′W/n)(W ′W/n)−1(W ′T/n)Ω−1(T ′W/n)(W ′W/n)−1(W ′ε/n)

∣∣∣∣∣∣≤ Cn

∣∣∣∣∣∣(ε′W/n)(W ′T/n)(T ′W/n)(W ′ε/n)∣∣∣∣∣∣

≤ Cn∣∣∣∣∣∣(ε′W/n)

∣∣∣∣∣∣ ∣∣∣∣∣∣(W ′T/n)∣∣∣∣∣∣ ∣∣∣∣∣∣(T ′W/n)

∣∣∣∣∣∣ ∣∣∣∣∣∣(W ′ε/n)∣∣∣∣∣∣


√kn/n)Op(ζ(kn)

√kn/n)Op(

√mn/n) = Op(ζ(kn)2mnkn/n)

Thus, under conditions 4.7 and 4.8

ε′P (P ′ΣP )−1P ′ε = ε′T (nΩ)−1T ′ε+Op(nm−2αn ) +Op(n

1/2m−αn )

+Op(ζ(kn)2mnkn/n) +Op(ζ(kn)√mnknrn/n) = ε′T (nΩ)−1T ′ε+ op(

√rn)

(C.4)


Conditions of Lemma A.1 do not rely on the homoskedasticity assumption. Thus, the

72

result of the lemma remains valid even under heteroskedasticity, as long as ζ(rn)2/√n→ 0:


d→ N(0, 1) (C.5)

Next, as in the proof of Theorem 1,∣∣∣∣∣n(n−1ε′T )Ω−1(n−1T ′ε)√2rn

− n(n−1ε′T )Ω−1(n−1T ′ε)√2rn

∣∣∣∣∣ =

∣∣∣∣∣n(n−1ε′T )(Ω−1 − Ω−1)(n−1T ′ε)√2rn

∣∣∣∣∣≤ n||Ω−1n−1T ′ε||2(||Ω− Ω||+ C||Ω− Ω||2)√

2rn

Similarly to the proof of Theorem 1, ||Ω−1(n−1T ′ε)|| = Op(√rn/n).

Then

n||Ω−1n−1T ′ε||2(||Ω− Ω||+ C||Ω− Ω||2)√2rn

=nOp(rn/n)op(1/

√rn)√

2rn=op(√rn)√

2rn= op(1),

provided that ||Ω− Ω|| = op(1/√rn), which holds under rate conditions 4.6, 4.7, and 4.7.

Hence,

n(n−1ε′T )Ω−1(n−1T ′ε)− n(n−1ε′T )Ω−1(n−1T ′ε)√rn

p→ 0 (C.6)


Proof of Lemma A.4. First,

||Ω− Ω|| = ||∑i

TiT′i (ε

2i − ε2i )/n|| = ||

∑i

TiT′i

((εi + gi − gi)2 − ε2i

)/n||

= ||∑i

TiT′i

((gi − gi)2 + 2εi(gi − gi)

)/n|| ≤ sup

i||Ti||2

∣∣∣∑i

((gi − gi)2 + 2εi(gi − gi)

)/n∣∣∣

= ζ(rn)2[Op

(mn/n+m−2αn

)+Op

(n−1/2m1/2

n (mn/n+m−2αn )1/2)]

= Op

(ζ(rn)2(mn/n+m−2αn )

)The following two results can be obtained exactly as in Lemma A.6 in Donald et al.

73

(2003):

||Ω− Ω|| = ||∑i

TiT′i (ε

2i − σ2

i )/n|| = Op(ζ(rn)√rn/n)

||Ω− Ω|| = ||∑i

TiT′iε

2i /n− Ω|| = Op(ζ(rn)

√rn/n)

Finally, the results about the eigenvalues can also be obtained in the same way as in

Lemma A.6 in Donald et al. (2003).


Denote ΩP = (σ2n−1P ′P ) in the homoskedastic case and ΩP = (n−1P ′ΣP ) in the het-

eroskedastic case. Then under homoskedasticity

ξ = n(n−1ε′P )(σ2n−1P ′P )−1(n−1P ′ε) = n(n−1ε′P )Ω−1P (n−1P ′ε),

while under heteroskedasticity

ξHC = n(n−1ε′P )(n−1P ′ΣP )−1(n−1P ′ε) = n(n−1ε′P )Ω−1P (n−1P ′ε)

Note that

√rnn

n(n−1ε′P )Ω−1P (n−1P ′ε)− rn√2rn

=1√2

(n−1ε′P )Ω−1P (n−1P ′ε) + T2,

where T2 = −rn/(n√

2)→ 0.

Hence, it suffices to show that (n−1ε′P )Ω−1P (n−1P ′ε)p→ ∆.

Next, note that due to the projection nature of the series estimators, ε = MWY =

MW (ε∗ +R∗). Hence,

(n−1ε′P )Ω−1P (n−1P ′ε) =(n−1(ε∗ +R∗)′MWT

)Ω−1

(n−1T ′MW (ε∗ +R∗)

),

where Ω is defined in the statement of Theorem 3.

74

Next,

(n−1(ε∗ +R∗)′MWT

)Ω−1

(n−1T ′MW (ε∗ +R∗)

)=(n−1ε∗′MWT

)Ω−1

(n−1T ′MW ε

∗)+(n−1R∗′MWT

)Ω−1

(n−1T ′MW ε

∗)+(n−1R∗′MWT

)Ω−1

(n−1T ′MWR

∗)Similarly to the proofs of Theorems 1 and 2, but using the fact that supx∈X R

∗(x) = o(1)

instead of supx∈X R(x) = O(m−αn ),

(n−1R∗′MWT

)Ω−1

(n−1T ′MW ε

∗) ≤ CR∗′ε∗/(nσ2) ≤= Op(n−1/2)op(1) = op(1)

and (n−1R∗′MWT

)Ω−1

(n−1T ′MWR

∗) ≤ CR∗′R∗/(nσ2) = op(1)

Thus,

(n−1(ε∗ +R∗)′MWT

)Ω−1

(n−1T ′MW (ε∗ +R∗)

)= (n−1ε∗′MWT )Ω−1(n−1T ′MW ε

∗) + op(1)

Next, given that, as shown in the proof of Theorem 1, MWT = T + op(1) and the

eigenvalues of Ω are bounded above and below w.p.a. 1,

(n−1ε∗′MWT )Ω−1(n−1T ′MW ε∗) = (n−1ε∗′T )Ω−1(n−1T ′ε∗) + op(1)

Next,

∣∣∣(n−1ε∗′T )(Ω−1 − Ω∗−1)(n−1T ′ε∗)∣∣∣ ≤ ∣∣∣(n−1ε∗′T )(Ω∗−1(Ω− Ω∗)Ω∗−1(Ω− Ω∗)Ω∗−1)(n−1T ′ε∗)

∣∣∣+∣∣∣(n−1ε∗′T )(Ω∗−1(Ω− Ω∗)Ω∗−1)(n−1T ′ε∗)

∣∣∣ ≤ ||Ω∗−1n−1T ′ε∗||2(||Ω− Ω∗||+ C||Ω− Ω∗||2) = op(1)

Thus, (n−1ε∗′T )Ω−1(n−1T ′ε∗) = (n−1ε∗′T )Ω∗−1(n−1T ′ε∗) + op(1).

To complete the proof, note that V ar(Tiε∗i ) ≤ Ω∗, because Ω∗ = E[ε∗2i TiT′i ]. Then

E[(n−1T ′ε∗ − E[Tiε∗i ])′Ω∗−1(n−1T ′ε∗ − E[Tiε

∗i ])]

≤ E[(n−1T ′ε∗ − E[Tiε∗i ])′V ar(Tiε

∗i )−1(n−1T ′ε∗ − E[Tiε

∗i ])]

= E[tr(V ar(Tiε

∗i )−1(n−1T ′ε∗ − E[Tiε

∗i ])(n

−1T ′ε∗ − E[Tiε∗i ])′)] = tr(Irn)/n = rn/n→ 0

75

Thus,

∣∣∣(n−1ε∗′T )Ω∗−1(n−1T ′ε∗)− E[ε∗iT′i ]Ω∗−1E[Tiε

∗i ]∣∣∣

≤∣∣∣(n−1T ′ε∗ − E[Tiε

∗i ])′Ω∗−1(n−1T ′ε∗ − E[Tiε

∗i ])∣∣∣+ 2

∣∣∣E[ε∗iT′i ]Ω∗−1(n−1T ′ε∗ − E[Tiε

∗i ])∣∣∣

≤ op(1) + 2√E[ε∗iT

′i ]Ω∗−1E[Tiε∗i ]

√(n−1T ′ε∗ − E[Tiε∗i ])

′Ω∗−1(n−1T ′ε∗ − E[Tiε∗i ])

= op(1) + 2√

∆op(1) = op(1)

Combining the results above, (n−1ε′P )Ω−1P (n−1P ′ε)p→ ∆.


Because series methods are used, it is still the case that ε = MWY . Under the local

alternative,

ε = MWY = MW (gn + ε) = MW (f ∗ + (r1/4n /n1/2)d+ ε)

= MW

(W ′β∗1 + (f ∗ −W ′β∗1) + (r1/4n /n1/2)d+ ε

)= MW

(ε+R + (r1/4n /n1/2)d

)Thus, the test statistic becomes

ξ − rn√2rn

=ε′P (σ2P ′P )−1P ′ε− rn√

2rn

=(ε+R + (r

1/4n /n1/2)d)′MWP (σ2P ′P )−1P ′MW (ε+R + (r

1/4n /n1/2)d)− rn√

2rn

=(ε+R + (r

1/4n /n1/2)d)′MWT (σ2T ′MWT )−1T ′MW (ε+R + (r

1/4n /n1/2)d)− rn√

2rn

=(ε+R + (r

1/4n /n1/2)d)′T (σ2T ′T )−1T ′(ε+R + (r

1/4n /n1/2)d)− rn√

2rn,


Step 1. It has been shown in the proof of Theorem 1 that ||T ′T /n−T ′T/n|| = op(1/√rn)

as long as condition 4.1 holds, ζ(kn)2kn√rn/n→ 0. This also implies that the smallest and

largest eigenvalues of T ′T /n converge to one.

76


(ε+R + (r1/4n /n1/2)d)′T (σ2T ′T )−1T ′(ε+R + (r1/4n /n1/2)d)

= ε′T (σ2T ′T )−1T ′ε+ 2R′T (σ2T ′T )−1T ′ε+R′T (σ2T ′T )−1T ′R

+ 2(r1/4n /n1/2)d′T (σ2T ′T )−1T ′ε+ 2(r1/4n /n1/2)d′T (σ2T ′T )−1T ′R + (r1/2n /n)d′T (σ2T ′T )−1T ′d

As in the proof of Theorem 1, R′T (σ2T ′T )−1T ′R = Op(nm−2αn ) and

∣∣∣R′T (σ2T ′T )−1T ′ε∣∣∣ =

Op(n1/2m−αn ).

Recall that, because T ′T /n and T T ′/n have the same nonzero eigenvalues and all eigen-

values of T ′T /n converge to one, λmax(T T′/n) converges in probability to 1.

Thus, as for the fourth term,

∣∣∣(r1/4n /n1/2)d′T (σ2T ′T )−1T ′ε∣∣∣ ≤ (r1/4n /n1/2)

∣∣∣Cλmax(T T′/n)d′ε

∣∣∣≤ (r1/4n /n1/2)

∣∣∣Cd′ε∣∣∣ = (r1/4n /n1/2)Op(n1/2) = Op(r

1/4n )

As for the fifth term,

∣∣∣(r1/4n /n1/2)d′T (σ2T ′T )−1T ′R∣∣∣ ≤ (r1/4n /n1/2)

∣∣∣Cλmax(T T′/n)d′R

∣∣∣ ≤ (r1/4n /n1/2)∣∣∣Cd′R∣∣∣

By the Cauchy-Schwartz inequality,

∣∣∣(r1/4n /n1/2)d′R∣∣∣ ≤ (r1/4n n1/2)

√√√√(∑i

R2i /n

)(∑i

d2i /n

)

= (r1/4n n1/2)√Op(m−2αn )E[d2i ](1 + op(1)) = Op(r

1/4n m−αn n1/2)

As for the last term,

(r1/2n /n)d′T (σ2T ′T )−1T ′d = (r1/2n /n)d′T (σ2T ′T )−1T ′d

+ (r1/2n /n)d′PWT (σ2T ′T )−1T ′PWd− 2(r1/2n /n)d′PWT (σ2T ′T )−1T ′d

77

Using Assumption 6,

∣∣∣∣∣∣(r1/2n /n)d′PWT (σ2T ′T )−1T ′d∣∣∣∣∣∣ = (r1/2n /n)

∣∣∣∣∣∣d′W (W ′W )−1W ′T (σ2T ′T )−1T ′d∣∣∣∣∣∣

= r1/2n

∣∣∣∣∣∣(d′W/n)(W ′W/n)−1(W ′T/n)(σ2T ′T /n)−1(T ′d/n)∣∣∣∣∣∣

≤ Cr1/2n

∣∣∣∣∣∣(d′W/n)(W ′T/n)(T ′d/n)∣∣∣∣∣∣ ≤ Cr1/2n

∣∣∣∣∣∣(d′W/n)∣∣∣∣∣∣ ∣∣∣∣∣∣(W ′T/n)

∣∣∣∣∣∣ ∣∣∣∣∣∣(T ′d/n)∣∣∣∣∣∣

= r1/2n Op(√mn/n)Op(ζ(kn)

√kn/n)Op(

√rn/n) = Op

(ζ(kn)r3/2n m1/2

n k1/2n /n3/2)

Next,

∣∣∣∣∣∣(r1/2n /n)d′PWT (σ2T ′T )−1T ′PWd∣∣∣∣∣∣ = (r1/2n /n)

∣∣∣∣∣∣d′W (W ′W )−1W ′T (σ2T ′T )−1T ′W (W ′W )−1W ′d∣∣∣∣∣∣

= r1/2n

∣∣∣∣∣∣(d′W/n)(W ′W/n)−1(W ′T/n)(σ2T ′T /n)−1(T ′W/n)(W ′W/n)−1(W ′d/n)∣∣∣∣∣∣

≤ Cr1/2n

∣∣∣∣∣∣(d′W/n)(W ′T/n)(T ′W/n)(W ′d/n)∣∣∣∣∣∣

≤ Cr1/2n

∣∣∣∣∣∣(d′W/n)∣∣∣∣∣∣ ∣∣∣∣∣∣(W ′T/n)

∣∣∣∣∣∣ ∣∣∣∣∣∣(T ′W/n)∣∣∣∣∣∣ ∣∣∣∣∣∣(W ′d/n)

∣∣∣∣∣∣= r1/2n Op(

√mn/n)Op(ζ(kn)

√kn/n)Op(ζ(kn)

√kn/n)Op(

√mn/n) = Op(ζ(kn)2r1/2n mnkn/n

2)

Thus, under rate condition 4.3,

(r1/2n /n)d′T (σ2T ′T )−1T ′d = (r1/2n /n)d′T (σ2T ′T )−1T ′d+ op(r1/2n )

Next, similarly to the proof of Theorem 1,∣∣∣∣∣r1/2n (n−1d′T )(n−1σ2T ′T )−1(n−1T ′d)√2rn

− r1/2n (n−1d′T )(n−1σ2T ′T )−1(n−1T ′d)√

2rn

∣∣∣∣∣≤r1/2n ||(n−1σ2T ′T )−1n−1T ′ε||2

(||(n−1σ2T ′T )− (n−1σ2T ′T )||+ C||(n−1σ2T ′T )− (n−1σ2T ′T )||2

)√

2rn

=r1/2n Op(rn/n)op(1/

√rn)√

2rn= op(r

1/2n /n) = op(1)

Next,

(r1/2n /n)d′T (T ′T )−1T ′d/σ2 = (r1/2n /n)(d′T (T ′T )−1T ′)(T (T ′T )−1T ′d)/σ2 = (k1/2n /n)d′d/σ2,

where d = T (T ′T )−1T ′d are the fitted values from the nonparametric series regression of d

78

on T .

By Lemma 4, (d− d)′(d− d) = Op (n(rn/n+ r−2αn )).

By the Cauchy-Schwartz inequality and Lemma 4,

∣∣∣d′(d− d)∣∣∣ ≤ n

√√√√(∑i

d2i /n

)(∑i

(di − di)2/n

)

= n√E[d2i ](1 + op(1))Op (n(rn/n+ r−2αn )) = Op

(n(rn/n+ r−2αn )1/2

)Thus,

(r1/2n /n)d′T (σ2T ′T )−1T ′d = (r1/2n /n)d′d/σ2 +Op

(r1/2n (rn/n+ r−2αn )1/2

)= r1/2n E[d2i ](1 + op(1))/σ2 +Op

(r1/2n (rn/n+ r−2αn )1/2

)+ op(r

1/2n )

where the last equality is due to the law of large numbers and σ2 = σ2 + op(1).

Thus,

ε′P (σ2P ′P )−1P ′ε = ε′T (σ2T ′T )−1T ′ε+ r1/2n E[d2i ]/σ2 +Op(nm

−2αn ) +Op(n

1/2m−αn )

+Op(r1/4n ) +Op(r

1/4n m−αn n1/2) +Op

(r1/2n (rn/n+ r−2αn )1/2

)+ op(r

1/2n )

Next, as has been shown in the proof of Theorem 1,

ε′P (σ2P ′P )−1P ′ε = ε′T (σ2T ′T )−1T ′ε+Op(nm−2αn ) +Op(n

1/2m−αn )

+Op(ζ(kn)2mnkn/n) +Op(ζ(kn)√mnknrn/n)

Under rate conditions 4.3 and 4.4,

ε′P (σ2P ′P )−1P ′ε = ε′T (σ2T ′T )−1T ′ε+ r1/2n E[d2i ]/σ2 + op(r

1/2n ) (C.7)


The result of Lemma A.1 applies, so that


d→ N(0, 1) (C.8)

79

Next, use the following lemma:

Lemma A.5. Suppose that Assumptions 2, 3, and 8 hold Then

n(n−1ε′T )(σ2n−1T ′T )−1(n−1T ′ε)− n(n−1ε′T )Ω−1(n−1T ′ε)√rn

p→ 0 (C.9)

Proof of Lemma A.5. To prove the lemma, I will again need an auxiliary result given in the

following lemma.

Lemma A.6. Let Ω = σ2T ′T /n, Ω = σ2T ′T/n Ω = σ2T ′T/n, Ω = σ2E[TiT′i ]. Suppose that

Assumptions 2(ii), 3, and 8 are satisfied. Then

||Ω− Ω|| = Op(ζ(kn)2kn/n)

||Ω− Ω|| = Op(rnn−1/2)

||Ω− Ω|| = Op(ζ(rn)r1/2n /n1/2)

If Assumption 2(i) is also satisfied then 1/C ≤ λmin(Ω) ≤ λmax(Ω) ≤ C, and if ζ(kn)2kn/n→0, ζ(rn)r

1/2n /n1/2 → 0, and rn(mn/n + m−2αn )1/2 → 0, then w.p.a. 1, 1/C ≤ λmin(Ω) ≤

λmax(Ω) ≤ C and 1/C ≤ λmin(Ω) ≤ λmax(Ω) ≤ C.


Given the result of Lemma A.6, as in the proof of Theorem 1,∣∣∣∣∣n(n−1ε′T )Ω−1(n−1T ′ε)√2rn

− n(n−1ε′T )Ω−1(n−1T ′ε)√2rn

∣∣∣∣∣ ≤ n||Ω−1n−1T ′ε||2(||Ω− Ω||+ C||Ω− Ω||2)√2rn

=nOp(rn/n)op(1/

√rn)√

2rn=op(√rn)√

2rn= op(1),

provided that ||Ω − Ω|| = op(1/√rn), which holds under rate conditions which holds under

rate conditions 4.1–4.2.


Proof of Lemma A.6. It has been shown in the proof of Theorem 1 that ||T ′T /n−T ′T/n|| =Op(ζ(kn)2kn/n).

As long as σ2 p→ σ2, this implies ||Ω− Ω|| = Op(ζ(kn)2kn/n).

80

Under the local alternative,

ε = ε+ (gn − f) = ε+ (f ∗n − fn) + (r1/4n /n1/2)d

Thus,

σ2 = ε′ε/n = ε′ε/n+ (f ∗n − fn)′(f ∗n − fn)/n+ (r1/2n /n)d′d/n

+ 2(f ∗n − fn)′ε/n+ 2(r1/4n /n1/2)(f ∗n − fn)′d/n+ 2(r1/4n /n1/2)d′ε/n


i (ε2i − σ2) = Op(n

−1/2).

Second, similarly to the proof of Lemma A.3,

n−1∑i

εi(f∗ni − fni) = Op

(m1/2n (mn/n+m−2αn )1/2/n1/2

),

and

n−1∑i

(f ∗ni − fni)2 = Op(mn/n+m−2αn )

Next, by the law of large numbers,

(r1/2n /n)d′d/n = (r1/2n /n)E[d(Xi)2](1 + op(1)) = Op(r

1/2n /n)

In turn,

∣∣∣(r1/4n /n1/2)(f ∗n − fn)′d/n∣∣∣ = Op(r

1/4n (mn/n+m−2αn )1/2/n1/2)

Finally,

(r1/4n /n1/2)d′ε/n = (r1/4n /n1/2)Op(n−1/2) = Op(r

1/4n /n)

Combining the results,

σ2 − σ2 = Op(n−1/2) +Op

(m1/2n (mn/n+m−2αn )1/2/n1/2

)+Op(r

1/2n /n) +Op(r

1/4n (mn/n+m−2αn )1/2/n1/2) = Op(n

−1/2)

Moreover, by Lemma 1, E[||Ti||2] ≤ rn, which yields ||Ω− Ω|| = Op(rnn−1/2).

The remaining conclusions can be proved as in Lemma A.9.

81


Recall that

ξ = ε′P (σ2P ′P )−1P ′ε

Define Ω = σ2E[PiP′i ] = σ2Ikn , where Pi = P kn(Xi). The proof of the theorem relies on

verifying the conditions of the following lemmas:

Lemma A.7. (Donald et al. (2003), Lemma 6.2) If E[Piεi] = 0, E[(εiPiΩ−1P ′iεi)

2]/(kn√n)→

0, and kn →∞, then

n(n−1ε′P )Ω−1(n−1P ′ε)− kn√2kn

d→ N(0, 1) (C.10)

All three conditions of this lemma hold. E[Piεi] = 0 and kn →∞ hold trivially, while

E[(εiPiΩ−1P ′iεi)

2] ≤ CE[ε4i ||Pi||4] ≤ CE[||Pi||4] ≤ Cζ(kn)2kn

As long as conditions 5.4 holds, ζ(kn)2/√n→ 0, so that the condition of the lemma holds.

Lemma A.8. Suppose that Assumptions 2, 3, 10, and 11 hold. Then

n(n−1ε′P )(σ2n−1P ′P )−1(n−1P ′ε)− n(n−1ε′P )Ω−1(n−1P ′ε)√kn

p→ 0 (C.11)

Proof. Step 1. Show that ε can be replaced with ε.

Because ε = Y − g = ε+ (g − g),

n(n−1ε′P )(σ2n−1P ′P )−1(n−1P ′ε)− n(n−1ε′P )(σ2n−1P ′P )−1(n−1P ′ε)

= (g − g)′P (P ′P )−1P ′(g − g)/σ2 + 2(g − g)′P (P ′P )−1P ′ε/σ2

As for the first term,

(g − g)′P (P ′P )−1P ′(g − g)/σ2 ≤ (g − g)′(g − g)/σ2 = Op(nψn)

by the projection inequality, Assumption 10(b) and σ2 p→ σ2.

82

As for the second term,

∣∣∣(g − g)′P (P ′P )−1P ′ε/σ2∣∣∣ ≤ ∣∣∣λmax

((σ2P ′P/n)−1

)n−1(g − g)′PP ′ε

∣∣∣ ≤ ∣∣∣Cλmax(PP′/n)(g − g)′ε

∣∣∣Because P ′P/n and PP ′/n have the same nonzero eigenvalues and all eigenvalues of

P ′P/n converge to one, λmax(PP′/n) converges in probability to 1. Thus,

∣∣∣(g − g)′P (P ′P )−1P ′ε/σ2∣∣∣ ≤ ∣∣∣Cλmax(PP

′/n)(g − g)′ε∣∣∣ ≤ ∣∣∣C(g − g)′ε

∣∣∣ = op(k1/2n )

by Assumption 11.

Then

n(n−1ε′P )(σ2n−1P ′P )−1(n−1P ′ε)− n(n−1ε′P )(σ2n−1P ′P )−1(n−1P ′ε) = Op(nψn) + op(k1/2n )

As long as rate condition 5.3 holds, nψn/k1/2n → 0, this implies

n(n−1ε′P )(σ2n−1P ′P )−1(n−1P ′ε)− n(n−1ε′P )(σ2n−1P ′P )−1(n−1P ′ε)√kn

p→ 0

Step 2. Show that Ω = σ2P ′P/n can be replaced with Ω.

To prove this, I will use the auxiliary result presented in the following lemma.

Lemma A.9. Let Ω = σ2P ′P/n, Ω = σ2P ′P/n, Ω = σ2E[PiP′i ]. Suppose that Assumptions

2(ii), 3, 10, and 11 are satisfied. Then

||Ω− Ω|| = Op(kn/n1/2) +Op(knψn) + op(k

3/2n /n)

||Ω− Ω|| = Op(ζ(kn)k1/2n /n1/2)

If Assumption 2(i) is also satisfied then 1/C ≤ λmin(Ω) ≤ λmax(Ω) ≤ C, and if ψ1/2n kn +

ζ(kn)k1/2n /n1/2 → 0, then w.p.a. 1, 1/C ≤ λmin(Ω) ≤ λmax(Ω) ≤ C and 1/C ≤ λmin(Ω) ≤

λmax(Ω) ≤ C.


83

Using the result of Lemma A.9,∣∣∣∣∣n(n−1ε′P )Ω−1(n−1P ′ε)√2kn

− n(n−1ε′P )Ω−1(n−1P ′ε)√2kn

∣∣∣∣∣=

∣∣∣∣∣n(n−1ε′P )(Ω−1 − Ω−1)(n−1P ′ε)√2kn

∣∣∣∣∣ =

∣∣∣∣∣n(n−1ε′P )(Ω−1 − Ω−1)(n−1P ′ε)√2kn

∣∣∣∣∣=

∣∣∣∣∣n(n−1ε′P )(Ω−1(Ω− Ω)Ω−1(Ω− Ω)Ω−1)(n−1P ′ε)√2kn

− n(n−1ε′P )(Ω−1(Ω− Ω)Ω−1)(n−1P ′ε)√2kn

∣∣∣∣∣≤∣∣n(n−1ε′P )(Ω−1(Ω− Ω)Ω−1(Ω− Ω)Ω−1)(n−1P ′ε)

∣∣√

2kn+

∣∣n(n−1ε′P )(Ω−1(Ω− Ω)Ω−1)(n−1P ′ε)∣∣

√2kn

≤ n||Ω−1n−1P ′ε||2(||Ω− Ω||+ C||Ω− Ω||2)√2kn

Similarly to the proof of Theorem 1, ||Ω−1(n−1P ′ε)|| = Op(√kn/n).

Then

n||Ω−1n−1P ′ε||2(||Ω− Ω||+ C||Ω− Ω||2)√2kn

=nOp(kn/n)op(1/

√kn)√

2kn=op(√kn)√

2kn= op(1),

provided that ||Ω− Ω|| = op(1/√kn), which holds under rate conditions 5.1 and 5.2.

The result of Theorem 5 follows directly from combining the results in Equations C.10

and C.11.

Proof of Lemma A.9. Due to homoskedasticity,

||Ω− Ω|| = ||(σ2 − σ2)∑i

PiP′i/n|| ≤ |σ2 − σ2|

∑i

||Pi||2/n

= |n−1∑i

(ε2i − σ2) + 2n−1∑i

εi(gi − gi) + n−1∑i

(gi − gi)2|∑i

||Pi||2/n


i (ε2i − σ2) = Op(n

−1/2).

Second, by Assumption 10, n−1∑

i (gi − gi)2 = Op(ψn).

Finally, by Assumption 11, n−1∑

i εi(gi − gi) = op(k1/2n /n).

Moreover, by Lemma 1, E[||Pi||2] ≤ kn, which yields

||Ω− Ω|| = Op(kn/n1/2) +Op(knψn) + op(k

3/2n /n)

84

Next, ||Ω− Ω|| = ||σ2(P ′P/n− E[PiP′i ])||.

Thus,

E[||Ω− Ω||2] = E[||σ2∑i

(PiP′i − E[PiP

′i ])/n||2]

= σ4E[||PiP ′i − E[PiP′i ]||2]/n ≤ CE[||Pi||4]/n = Cζ(kn)2kn/n,

and by Markov inequality ||Ω− Ω|| = Op(ζ(kn)k1/2n /n1/2).

The remaining conclusions follow from Lemma A.6 in Donald et al. (2003).

Remark A.2 (Discussion of Assumptions 11 and 14.). Assumptions 11 and 14 can be jus-

tified as follows. Supposed that gi are leave-one-out kernel estimates of gi (i.e., ones that do

not use observation i to estimate gi). Then a typical form of gi is

gi =

∑j 6=i YjKij∑j 6=iKij

=

∑j 6=i (gj + εj)Kij∑

j 6=iKij

=

∑j 6=i gjKij∑j 6=iKij

+

∑j 6=i εjKij∑j 6=iKij

,

where Kij = K(Ai−Aj

h

)is a d-dimensional kernel function, and Ai is chosen appropriately

(e.g. Ai = Xi in usual nonparametric models or Ai = X ′iα in single index models).

Then

n∑i=1

εi(gi − gi) =1nhd

∑ni=1

∑j 6=i εiεjKij

1nhd

∑j 6=iKij

+1nhd

∑ni=1

∑j 6=i εi(gj − gi)Kij

1nhd

∑j 6=iKij

Under appropriate regularity conditions (possibly including trimming to deal with the

small denominator problem), 1nhd

∑j 6=iKij ≡ pi = pi(1 + op(1)), where pi is the density at

observation i. If pi is bounded below from zero, then the denominator will be asymptotically

bounded. Thus, it is enough to deal with the numerator.

Let In1 = 1nhd

∑ni=1

∑j 6=i εiεjKij and In2 = 1

nhd

∑ni=1

∑j 6=i εi(gj − gi)Kij.

First, consider In1.

E[I2n1] = E

[1

n2h2d

n∑i=1

∑j 6=i

n∑k=1

∑l 6=k

εiεjεkεlKijKkl

]

This expectation is nonzero only when i = k and j = l or when i = l and j = k, so it

85

reduces to

E[I2n1] = 2E

[1

n2h2d

n∑i=1

∑j 6=i

ε2i ε2jK

2ij

]= 2

1

h2dE[ε21ε

22K

212] = 2

1

h2dσ4E[K2

12]

Next, for some appropriate Ai (e.g. Ai = Xi in usual nonparametric models or Ai = X ′iα

in single index models)

1

h2dE[K2

12] =1

h2dE

[K

(A2 − A1

h

)2]

=

∫ ∫K

(a2 − a1h

)2

f(a1)f(a2)da1da2

=1

hd

∫ ∫K(u)2f(a1)f(a1 + hu)duda1 = hda

∫ ∫K(u)2f(a1)

2(1 + o(1))duda1

=1

hd

(∫K(u)2du

)E[f(Ai)](1 + o(1)) = O(h−d)

Hence, E[I2n1] = O(hda), and consequently In1 = Op(h−d/2).

Next, consider In2.

E[I2n2] = E

[1

n2h2d

n∑i=1

∑j 6=i

n∑k=1

∑l 6=k

εiεk(gj − gi)(gl − gk)KijKkl

]

This expectation is nonzero only when i = k, so it reduces to

E[I2n2] = E

[1

n2h2d

n∑i=1

∑j 6=i

∑l 6=i

ε2i (gj − gi)(gl − gi)KijKil

]

= E

[1

n2h2d

n∑i=1

∑j 6=i

∑l 6=i,j

ε2i (gj − gi)(gl − gi)KijKil

]+ E

[1

n2h2d

n∑i=1

∑j 6=i

ε2i (gj − gi)2K2ij

]

=n

h2dE[ε21(g2 − g1)(g3 − g1)K12K13] +

1

h2dE[ε21(g2 − g1)2K2

12]

=n

h2dσ2E[(g2 − g1)(g3 − g1)K12K13] +

1

h2dσ2E[(g2 − g1)2K2

12]

86


1

h2dE[(g2 − g1)(g3 − g1)K12K13] =

1

h2dE

[(g(A2)− g(A1))(g(A3)− g(A1))

K

(A2 − A1

h

)K

(A3 − A1

h

)]

=1

h2d

∫ ∫ ∫(g(a2)− g(a1))(g(a3)− g(a1))K

(a2 − a1h

)K

(a3 − a1h

)f(a1)f(a2)f(a3)da1da2da3

=

∫ ∫ ∫(g(a1 + hu)− g(a1))(g(a1 + hv)− g(a1))K(u)K(v)

f(a1)f(a1 + hu)f(a1 + hv)da1dudv

=

∫ ∫ ∫(g′(a1)hu+ g′′(a1)h

2u2 + o(h2))

(g′(a1)hv + g′′(a1)h2v2 + o(h2))K(u)K(v)

f(a1)(f(a1) + f ′(a1)hu+ o(h))

(f(a1) + f ′(a1)hv + o(h))da1dudv = O(h4)

if the second order kernel is used. If a νth higher order kernel (ν > 2) is used, then this will

become O(h2ν).

Hence, nσ2E[(g2 − g1)(g3 − g1)K12K13] = O(nh4).

As for the second term,

1

h2dE[(g2 − g1)2K2

12] =1

h2dE

[(g(A2)− g(A1))

2K

(A2 − A1

h

)2]

=1

h2d

∫ ∫(g(a2)− g(a1))

2K

(a2 − a1h

)2

f(a1)f(a2)da1da2

=1

hd

∫ ∫(g(a1 + hu)− g(a1))

2K(u)2f(a1)f(a1 + hu)da1du

=1

hd

∫ ∫g′(a1)

2h2u2K(u)2f(a1)2(1 + o(1))da1du

=1

hd

(∫u2K(u)2du

)E[g′(Ai)

2f(Ai)](1 + o(1)) = O(h2−d)

Hence, σ2E[(g2 − g1)2K212] = O(h2−d).

87

Thus, E[I2n2] = O(nh4 + h2−d), and consequently In2 = Op(n1/2h2) +Op(h

1−d/2).

Combining the results above,

n∑i=1

εi(gi − gi) = Op(n1/2h2) +Op(h

−d/2)

In order for Assumptions 11 and 14 to hold, the bandwidth h and the number of series

terms kn need to satisfy:

n1/2h2/k1/2n → 0

1/(k1/2n hd/2)→ 0


Conditions of Lemma A.7 do not rely on the homoskedasticity assumption. Thus, the

result of the lemma remains valid even under heteroskedasticity, as long as ζ(kn)2/√n→ 0:


d→ N(0, 1), (C.12)

where now Ω = E[ε2iPiP′i ].

Lemma A.10. Suppose that Assumptions 2, 3, 10, and 11 hold. Then

n(n−1ε′P )(n−1P ′ΣP )−1(n−1P ′ε)− n(n−1ε′P )Ω−1(n−1P ′ε)√kn

p→ 0 (C.13)

Proof.

Lemma A.11. Let Ω =∑

i ε2iPiP

′i/n, Ω =

∑i ε

2iPiP

′i/n, Ω =

∑i σ

2i PiP

′i/n, Ω = E[ε2iPiP

′i ],

where σ2i = E[ε2i |Xi]. Suppose that Assumptions 2(ii), 3, 10, and 11 are satisfied. Then

||Ω− Ω|| = Op(ζ(kn)2ψn) + op(ζ(kn)2k1/2n /n)

||Ω− Ω|| = Op(ζ(kn)k1/2n /n1/2)

||Ω− Ω|| = Op(ζ(kn)k1/2n /n1/2)

If Assumption 2(i) is also satisfied then 1/C ≤ λmin(Ω) ≤ λmax(Ω) ≤ C, and if ζ(kn)2ψ1/2n →

0 and ζ(kn)k1/2n /n1/2 → 0, then w.p.a. 1, 1/C ≤ λmin(Ω) ≤ λmax(Ω) ≤ C and 1/C ≤

88

λmin(Ω) ≤ λmax(Ω) ≤ C.

Lemma A.10 can now be proven in two steps.

Step 1. Show that ε can be replaced with ε.

Because ε = ε+ (g − g),

∣∣∣n(n−1ε′P )(n−1P ′ΣP )−1(n−1P ′ε)− n(n−1ε′P )(n−1P ′ΣP )−1(n−1P ′ε)∣∣∣

=∣∣∣n(n−1(g − g)′P )(n−1P ′ΣP )−1(n−1P ′(g − g)) + 2n(n−1(g − g)′P )(n−1P ′ΣP )−1(n−1P ′ε)

∣∣∣≤∣∣∣Cn(n−1(g − g)′P )(n−1P ′(g − g))

∣∣∣+ 2∣∣∣Cn(n−1(g − g)′P )(n−1P ′ε)

∣∣∣,because the eigenvalues of n−1P ′ΣP are bounded below and above by Lemma A.11. As was

discussed in the proof of Theorem 5, the eigenvalues of PP ′/n are bounded above. Thus,

∣∣∣Cn(n−1(g − g)′P )(n−1P ′(g − g))∣∣∣+ 2

∣∣∣Cn(n−1(g − g)′P )(n−1P ′ε)∣∣∣

=∣∣∣C(g − g)′(PP ′/n)(g − g)

∣∣∣+ 2∣∣∣C(g − g)′(PP ′/n)ε

∣∣∣ ≤ ∣∣∣C(g − g)′(g − g)∣∣∣+ 2

∣∣∣C(g − g)′ε∣∣∣

= Op(nψn) + op(k1/2n ) = Op(nψn) + op(k

1/2n )

As long as nψn/k1/2n → 0,

n(n−1ε′P )(n−1P ′ΣP )−1(n−1P ′ε)− n(n−1ε′P )(n−1P ′ΣP )−1(n−1P ′ε)√kn

p→ 0

Step 2. Show that Ω = P ′ΣP/n can be replaced with Ω.

Similarly to the proof of Theorem 5,∣∣∣∣∣n(n−1ε′P )Ω−1(n−1P ′ε)√2kn

− n(n−1ε′P )Ω−1(n−1P ′ε)√2kn

∣∣∣∣∣ =

∣∣∣∣∣n(n−1ε′P )(Ω−1 − Ω−1)(n−1P ′ε)√2kn

∣∣∣∣∣≤ n||Ω−1n−1P ′ε||2(||Ω− Ω||+ C||Ω− Ω||2)√

2kn

Similarly to the proof of Theorem 1, ||Ω−1(n−1P ′ε)|| = Op(√kn/n).

Then

n||Ω−1n−1P ′ε||2(||Ω− Ω||+ C||Ω− Ω||2)√2kn

=nOp(kn/n)op(1/

√kn)√

2kn=op(√kn)√

2kn= op(1),

89



and C.13.

Proof of Lemma A.11. First,

||Ω− Ω|| = ||∑i

PiP′i (ε

2i − ε2i )/n|| = ||

∑i

PiP′i

((εi + gi − gi)2 − ε2i

)/n||

= ||∑i

PiP′i

((gi − gi)2 + 2εi(gi − gi)

)/n|| ≤ sup

i||Pi||2

∣∣∣∑i

((gi − gi)2 + 2εi(gi − gi)

)/n∣∣∣

= ζ(kn)2(Op(ψn) + op(k1/2n /n)) = Op(ζ(kn)2ψn) + op(ζ(kn)2k1/2n /n)

The following two results can be obtained exactly as in Lemma A.6 in Donald et al.

(2003):

||Ω− Ω|| = ||∑i

PiP′i (ε

2i − σ2

i )/n|| = Op(ζ(kn)√kn/n)

||Ω− Ω|| = ||∑i

PiP′iε

2i /n− Ω|| = Op(ζ(kn)

√kn/n)

Finally, the results about the eigenvalues can also be obtained in the same way as in

Lemma A.6 in Donald et al. (2003).


Note that

√knn


=1√2

(n−1ε′P )Ω−1(n−1P ′ε) + T1,

where T1 = − knn√2→ 0.

Hence, it suffices to show that (n−1ε′P )Ω−1(n−1P ′ε)p→ ∆.

90

First, note that V ar(Piε∗i ) ≤ Ω∗, because Ω∗ = E[ε∗2i PiP′i ]. Then

E[(n−1P ′ε∗ − E[Piε∗i ])′Ω∗−1(n−1P ′ε∗ − E[Piε

∗i ])]

≤ E[(n−1P ′ε∗ − E[Piε∗i ])′V ar(Piε

∗i )−1(n−1P ′ε∗ − E[Piε

∗i ])]

= E[tr(V ar(Piε

∗i )−1(n−1P ′ε∗ − E[Piε

∗i ])(n

−1P ′ε∗ − E[Piε∗i ])′)] = tr(Ikn)/n = kn/n→ 0

Thus,

∣∣∣(n−1ε∗′P )Ω∗−1(n−1P ′ε∗)− E[ε∗iP′i ]Ω∗−1E[Piε

∗i ]∣∣∣

≤∣∣∣(n−1P ′ε∗ − E[Piε

∗i ])′Ω∗−1(n−1P ′ε∗ − E[Piε

∗i ])∣∣∣+ 2

∣∣∣E[ε∗iP′i ]Ω∗−1(n−1P ′ε∗ − E[Piε

∗i ])∣∣∣

≤ op(1) + 2√E[ε∗iP

′i ]Ω∗−1E[Piε∗i ]

√(n−1P ′ε∗ − E[Piε∗i ])

′Ω∗−1(n−1P ′ε∗ − E[Piε∗i ])

= op(1) + 2√

∆op(1) = op(1)

Hence, (n−1ε∗′P )Ω∗−1(n−1P ′ε∗) = ∆ + op(1).

Next,

∣∣∣(n−1ε∗′P )Ω∗−1(n−1P ′ε∗)− (n−1ε∗′P )Ω∗−1(n−1P ′ε∗)∣∣∣

≤∣∣∣(n−1P ′ε∗ − n−1P ′ε∗)′Ω∗−1(n−1P ε∗ − n−1P ′ε∗)∣∣∣+ 2

∣∣∣(n−1ε∗′P )Ω∗−1(n−1P ′ε∗ − n−1P ′ε∗)∣∣∣

By the assumption of the theorem, supx∈X |f(x, θ, h)− f(x, θ∗, h∗)| → 0, so that

n−1P ′ε∗ − n−1P ′ε∗ = n−1∑i

Pi(f(Xi, θ, h)− f(Xi, θ∗, h∗)) = op(1)

Thus, ∣∣∣(n−1ε∗′P )Ω∗−1(n−1P ′ε∗)− (n−1ε∗′P )Ω∗−1(n−1P ′ε∗)∣∣∣ = op(1),

and (n−1ε∗′P )Ω∗−1(n−1P ′ε∗) = ∆ + op(1).

Finally,

∣∣∣(n−1ε∗′P )(Ω−1 − Ω∗−1)(n−1P ′ε∗)∣∣∣ ≤ ∣∣∣(n−1ε∗′P )(Ω∗−1(Ω− Ω∗)Ω∗−1(Ω− Ω∗)Ω∗−1)(n−1P ′ε∗)

∣∣∣+∣∣∣(n−1ε∗′P )(Ω∗−1(Ω− Ω∗)Ω∗−1)(n−1P ′ε∗)

∣∣∣ ≤ ||Ω∗−1n−1P ′ε∗||2(||Ω− Ω∗||+ C||Ω− Ω∗||2)→ 0,

It follows from the results above and from the triangle inequality that (n−1ε′P )Ω−1(n−1P ′ε)p→

91

∆.


Proof of this theorem follows along the same lines as proof of Theorem 5.

Recall that

ξ = ε′P (σ2P ′P )−1P ′ε

Define Ω = σ2E[PiP′i ] = σ2Ikn , where Pi = P kn(Xi). First, the conditions of Lemma A.7

hold, so that


d→ N(0, 1) (C.14)

Next, I prove the following result.

Lemma A.12. Suppose that Assumptions 2, 3, 12, 13, 14 hold. Then

n(n−1ε′P )(σ2n−1P ′P )−1(n−1P ′ε)− n(n−1ε′P )Ω−1(n−1P ′ε)√kn

p→ E[d2i ]/σ2 (C.15)

Proof. Step 1. Show that ε can be replaced with ε.

Under the local alternative,

ε = ε+ (gn − f) = ε+ (f ∗n − fn) + (k1/4n /n1/2)d

Thus,


= (f ∗n − fn)′P (P ′P )−1P ′(f ∗n − fn)/σ2 + (k1/2n /n)d′P (P ′P )−1P ′d/σ2

+ 2(f ∗n − fn)′P (P ′P )−1P ′ε/σ2 + 2(k1/4n /n1/2)d′P (P ′P )−1P ′ε/σ2

+ 2(k1/4n /n1/2)(f ∗n − fn)′P (P ′P )−1P ′d/σ2


(f ∗n − fn)′P (P ′P )−1P ′(f ∗n − fn)/σ2 ≤ (f ∗n − fn)′(f ∗n − fn)/σ2 = Op(nψn)

92

by the projection inequality, Assumption 10(b) and σ2 p→ σ2.

As for the second term, note that

(k1/2n /n)d′P (P ′P )−1P ′d/σ2 = (k1/2n /n)(d′P (P ′P )−1P ′)(P (P ′P )−1P ′d)/σ2 = (k1/2n /n)d′d/σ2,

where d = P (P ′P )−1P ′d are the fitted values from the nonparametric series regression of d

on P . Next,

∣∣∣d′d− d′d∣∣∣ =∣∣∣(d− d)′(d− d) + 2d′(d− d)

∣∣∣ ≤ ∣∣∣(d− d)′(d− d)∣∣∣+ 2

∣∣∣d′(d− d)∣∣∣

By Lemma 5, (d− d)′(d− d) = Op (n(kn/n+ k−2αn )).

By the Cauchy-Schwartz inequality and Lemma 5,

∣∣∣d′(d− d)∣∣∣ ≤ n

√√√√(∑i

d2i /n

)(∑i

(di − di)2/n

)

= n√E[d2i ](1 + op(1))Op (n(kn/n+ k−2αn )) = Op

(n(kn/n+ k−2αn )1/2

)Thus,

(k1/2n /n)d′P (P ′P )−1P ′d/σ2 = (k1/2n /n)d′d/σ2 +Op

(k1/2n (kn/n+ k−2αn )1/2

)= k1/2n E[d2i ](1 + op(1))/σ2 +Op

(k1/2n (kn/n+ k−2αn )1/2

)where the last equality is due to the law of large numbers and σ2 = σ2 + op(1).

As for the third term,

∣∣∣(f ∗n − fn)′P (P ′P )−1P ′ε/σ2∣∣∣ ≤ ∣∣∣λmax

((σ2P ′P/n)−1

)n−1(f ∗n − fn)′PP ′ε

∣∣∣≤∣∣∣Cλmax(PP

′/n)(f ∗n − fn)′ε∣∣∣

Because P ′P/n and PP ′/n have the same nonzero eigenvalues and all eigenvalues of

P ′P/n converge to one, λmax(PP′/n) converges in probability to 1. Thus,

∣∣∣(f ∗n − fn)′P (P ′P )−1P ′ε/σ2∣∣∣ ≤ ∣∣∣Cλmax(PP

′/n)(f ∗n − fn)′ε∣∣∣ ≤ ∣∣∣C(f ∗n − fn)′ε

∣∣∣

93

By Assumption 14,∣∣∣∑i εi(f

∗ni − fni)

∣∣∣ = op(k1/2n ).

As for the fourth term,

(k1/4n /n1/2)∣∣∣d′P (P ′P )−1P ′ε/σ2

∣∣∣ ≤ (k1/4n /n1/2)∣∣∣λmax

((σ2P ′P/n)−1

)n−1d′PP ′ε

∣∣∣≤ (k1/4n /n1/2)

∣∣∣Cλmax(PP′/n)d′ε

∣∣∣ ≤ (k1/4n /n1/2)∣∣∣Cd′ε∣∣∣

= (k1/4n /n1/2)Op(n1/2) = Op(k

1/4n )

As for the last term,

∣∣∣(k1/4n /n1/2)(f ∗n − fn)′P (P ′P )−1P ′d/σ2∣∣∣ ≤ (k1/4n /n1/2)

∣∣∣λmax

((σ2P ′P/n)−1

)n−1(f ∗n − fn)′PP ′d

∣∣∣≤ (k1/4n /n1/2)

∣∣∣Cλmax(PP′/n)(f ∗n − fn)′d

∣∣∣ ≤ (k1/4n /n1/2)∣∣∣C(f ∗n − fn)′d

∣∣∣By the Cauchy-Schwartz inequality,

∣∣∣(k1/4n /n1/2)(f ∗n − fn)′d∣∣∣ ≤ (k1/4n n1/2)

√√√√(∑i

(f ∗ni − fni)2/n

)(∑i

d2i /n

)

= (k1/4n n1/2)√Op(ψn)E[d2i ](1 + op(1)) = Op(k

1/4n ψ1/2

n n1/2)

Then


= k1/2n E[d2i ]/σ2 +Op

(k1/2n (kn/n+ k−2αn )1/2

)+Op(k

1/4n ) +Op(k

1/4n ψ1/2

n n1/2) + op(k1/2n )

As long as rate conditions 5.3 and 5.4 hold, this implies that

n(n−1ε′P )(σ2n−1P ′P )−1(n−1P ′ε)− n(n−1ε′P )(σ2n−1P ′P )−1(n−1P ′ε)√kn

p→ E[d2i ]/σ2

Step 2. Show that Ω = σ2P ′P/n can be replaced with Ω.

To prove this, I will use the auxiliary result presented in the following lemma.

Lemma A.13. Let Ω = σ2P ′P/n, Ω = σ2P ′P/n, Ω = σ2E[PiP′i ]. Suppose that Assumptions

94

2(ii), 3, 13, and 14 are satisfied. Then

||Ω− Ω|| = Op(knn−1/2) +Op(k

3/2n /n) +Op(k

5/4n ψ1/2

n /n1/2) + op(k3/2n /n)

||Ω− Ω|| = Op(ζ(kn)k1/2n /n1/2)

If Assumption 2(i) is also satisfied then 1/C ≤ λmin(Ω) ≤ λmax(Ω) ≤ C, and if ψ1/2n kn +

ζ(kn)k1/2n /n1/2 → 0, then w.p.a. 1, 1/C ≤ λmin(Ω) ≤ λmax(Ω) ≤ C and 1/C ≤ λmin(Ω) ≤

λmax(Ω) ≤ C.


Using the result of Lemma A.13, similarly to the proof of Theorem 5,∣∣∣∣∣n(n−1ε′P )Ω−1(n−1P ′ε)√2kn

− n(n−1ε′P )Ω−1(n−1P ′ε)√2kn

∣∣∣∣∣ ≤ n||Ω−1n−1P ′ε||2(||Ω− Ω||+ C||Ω− Ω||2)√2kn

=nOp(kn/n)op(1/

√kn)√

2kn=op(√kn)√

2kn= op(1)



and C.15.

Proof of Lemma A.13. Under the local alternative,

ε = ε+ (g − fn) = ε+ (f ∗n − fn) + (k1/4n /n1/2)d

Thus,

σ2 = ε′ε/n = ε′ε/n+ (f ∗n − fn)′(f ∗n − fn)/n+ (k1/2n /n)d′d/n

+ 2(f ∗n − fn)′ε/n+ 2(k1/4n /n1/2)(f ∗n − fn)′d/n+ 2(k1/4n /n1/2)d′ε/n


i (ε2i − σ2) = Op(n

−1/2)

Second, by Assumptions 13 and 14, n−1∑

i εi(f∗ni − fni) = op(k

1/2n /n) and n−1

∑i (f

∗ni − fni)2 =

Op(ψn).

Next, by the law of large numbers, (k1/2n /n)d′d/n = (k

1/2n /n)E[d(Xi)

2](1 + op(1)) =

Op(k1/2n /n).

95

In turn,∣∣∣(k1/4n /n1/2)(f ∗n − fn)′d/n

∣∣∣ = Op(k1/4n ψ

1/2n /n1/2).

Finally,

(k1/4n /n1/2)d′ε/n = (k1/4n /n1/2)Op(n−1/2) = Op(k

1/4n /n)

Combining the results,

σ2 − σ2 = Op(n−1/2) +Op(k

1/2n /n) +Op(k

1/4n ψ1/2

n /n1/2) + op(k1/2n /n)

By Lemma 1, E[||Pi||2] ≤ kn, which yields

||Ω− Ω|| = Op(knn−1/2) +Op(k

3/2n /n) +Op(k

5/4n ψ1/2

n /n1/2) + op(k3/2n /n)

The remaining conclusions can be proved as in Lemma A.9.

96

References

Ait-Sahalia, Y., P. J. Bickel, and T. M. Stoker (2001): “Goodness-of-fit tests for ker-

nel regression with an application to option implied volatilities,” Journal of Econometrics,

105, 363 – 412.

Bierens, H. J. (1982): “Consistent model specification tests,” Journal of Econometrics, 20,

105 – 134.

——— (1990): “A Consistent Conditional Moment Test of Functional Form,” Econometrica,

58, 1443–1458.

Bravo, F. (2012): “Generalized empirical likelihood testing in semiparametric conditional

moment restrictions models,” The Econometrics Journal, 15, 1–31.

Breusch, T. S. and A. R. Pagan (1980): “The Lagrange Multiplier Test and its Ap-

plications to Model Specification in Econometrics,” The Review of Economic Studies, 47,

239–253.

Chen, X. and Y. Fan (1999): “Consistent hypothesis testing in semiparametric and non-

parametric models for econometric time series,” Journal of Econometrics, 91, 373 – 401.

de Jong, R. M. and H. J. Bierens (1994): “On the Limit Behavior of a Chi-Square Type

Test If the Number of Conditional Moments Tested Approaches Infinity,” Econometric

Theory, 10, 70–90.

Delgado, M. A. and W. G. Manteiga (2001): “Significance Testing in Nonparametric

Regression Based on the Bootstrap,” The Annals of Statistics, 29, 1469–1507.

Donald, S. G., G. W. Imbens, and W. K. Newey (2003): “Empirical likelihood estima-

tion and consistent tests with conditional moment restrictions,” Journal of Econometrics,

117, 55 – 93.

97

Engle, R. F. (1982): “A general approach to Lagrange multiplier model diagnostics,” Jour-

nal of Econometrics, 20, 83 – 104.

——— (1984): “Chapter 13 Wald, likelihood ratio, and Lagrange multiplier tests in economet-

rics,” Elsevier, vol. 2 of Handbook of Econometrics, 775 – 826.

Fan, Y. and Q. Li (1996): “Consistent Model Specification Tests: Omitted Variables and

Semiparametric Functional Forms,” Econometrica, 64, 865–890.

Finan, F., E. Sadoulet, and A. de Janvry (2005): “Measuring the poverty reduction

potential of land in rural Mexico,” Journal of Development Economics, 77, 27 – 51.

Gao, J., H. Tong, and R. Wolff (2002): “Model Specification Tests in Nonparametric

Stochastic Regression Models,” Journal of Multivariate Analysis, 83, 324 – 359.

Gong, X., A. van Soest, and P. Zhang (2005): “The effects of the gender of children

on expenditure patterns in rural China: a semiparametric analysis,” Journal of Applied

Econometrics, 20, 509–527.

Gozalo, P. L. (1993): “A Consistent Model Specification Test for Nonparametric Estima-

tion of Regression Function Models,” Econometric Theory, 9, 451–477.

Hall, A. (1990): “Lagrange Multiplier Tests for Normality against Seminonparametric Al-

ternatives,” Journal of Business & Economic Statistics, 8, 417–426.

Hausman, J. A. and W. K. Newey (1995): “Nonparametric Estimation of Exact Con-

sumers Surplus and Deadweight Loss,” Econometrica, 63, 1445–1476.

Hong, Y. and H. White (1995): “Consistent Specification Testing Via Nonparametric

Series Regression,” Econometrica, 63, 1133–1159.

Koenker, R. and J. A. Machado (1999): “GMM inference when the number of moment

conditions is large,” Journal of Econometrics, 93, 327 – 344.

98

Lavergne, P. and Q. Vuong (2000): “Nonparametric Significance Testing,” Econometric

Theory, 16, 576–601.

Levinsohn, J. and A. Petrin (2003): “Estimating Production Functions Using Inputs to

Control for Unobservables,” The Review of Economic Studies, 70, 317–341.

Li, Q., C. Hsiao, and J. Zinn (2003): “Consistent specification tests for semiparamet-

ric/nonparametric models based on series estimation methods,” Journal of Econometrics,

112, 295 – 325.

Li, Q. and J. S. Racine (2007): Nonparametric econometrics: theory and practice, Prince-

ton University Press.

Martins, M. F. O. (2001): “Parametric and semiparametric estimation of sample selection

models: an empirical application to the female labour force in Portugal,” Journal of Applied

Econometrics, 16, 23–39.

McCulloch, J. H. and E. R. Percy (2013): “Extended Neyman smooth goodness-of-fit

tests, applied to competing heavy-tailed distributions,” Journal of Econometrics, 172, 275

– 282, latest Developments on Heavy-Tailed Distributions.

Newey, W. K. (1985): “Maximum Likelihood Specification Testing and Conditional Mo-

ment Tests,” Econometrica, 53, 1047–1070.

——— (1997): “Convergence rates and asymptotic normality for series estimators,” Journal

of Econometrics, 79, 147 – 168.

Olley, G. S. and A. Pakes (1996): “The Dynamics of Productivity in the Telecommuni-

cations Equipment Industry,” Econometrica, 64, 1263–1297.

Reiss, P. C. and F. A. Wolak (2007): “Structural Econometric Modeling: Rationales and

Examples from Industrial Organization,” Handbook of Econometrics, Volume 6A, 4277 –

4415.

99

Robinson, P. M. (1988): “Root-N-Consistent Semiparametric Regression,” Econometrica,

56, 931–954.

Schmalensee, R. and T. M. Stoker (1999): “Household Gasoline Demand in the United

States,” Econometrica, 67, 645–662.

Sperlich, S. (2014): “On the choice of regularization parameters in specification testing: a

critical discussion,” Empirical Economics, 47, 427–450.

Sun, Y. and Q. Li (2006): “An alternative series based consistent model specification test,”

Economics Letters, 93, 37 – 44.

Trefethen, L. N. and D. Bau III (1997): Numerical Linear Algebra, vol. 50, SIAM.

Whang, Y.-J. and D. W. Andrews (1993): “Tests of specification for parametric and

semiparametric models,” Journal of Econometrics, 57, 277 – 318.

Yatchew, A. (1997): “An elementary estimator of the partial linear model,” Economics

Letters, 57, 135 – 143.

——— (1998): “Nonparametric Regression Techniques in Economics,” Journal of Economic

Literature, 36, 669–721.

Yatchew, A. and J. A. No (2001): “Household Gasoline Demand in Canada,” Economet-

rica, 69, 1697–1709.

Yatchew, A. J. (1992): “Nonparametric Regression Tests Based on Least Squares,” Econo-

metric Theory, 8, 435–451.

100

ConsistentLagrangeMultiplierTypeSpeciﬁcationTests ...€¦ · ConsistentLagrangeMultiplierTypeSpeciﬁcationTests forSemiparametricModels∗ Ivan Korolev† Job Market Paper October

Documents