Constrained Conditional Moment Restriction Models · Andres Santosz U.C.L.A. [email protected] First Draft: September, 2015 This Draft: July, 2020 Abstract Shape restrictions have

Constrained Conditional Moment Restriction Models∗

Victor Chernozhukov

M.I.T.

[email protected]

Whitney K. Newey†

M.I.T.

[email protected]

Andres Santos‡

U.C.L.A.

[email protected]

First Draft: September, 2015

This Draft: July, 2020

Abstract

Shape restrictions have played a central role in economics as both testable impli-

cations of classical theory and sufficient conditions for obtaining informative coun-

terfactual predictions. In this paper we provide a general procedure for inference

under shape restrictions in identified and partially identified models defined by con-

ditional moment restrictions. Our test statistics and proposed inference methods

are based on the minimum of the generalized method of moments (GMM) objective

function with and without shape restrictions. The critical values are obtained by

building a strong approximation to the statistic and then bootstrapping a conser-

vatively relaxed form of the statistic. Sufficient conditions are provided, including

strong approximation using Koltchinski’s coupling. Examples given in the paper

include inference for linear instrumental variable (IV) models with linear inequality

constraints, inference for the variability of quantile IV treatment effects, and infer-

ence for bounds on average equivalent variation in a demand model with general

heterogeneity. We find in Monte Carlo examples that the critical values are conser-

vatively accurate and that tests about objects of interest have good power relative

to unrestricted GMM. We also give an empirical application to estimating returns

to schooling when marginal returns are positive and declining in schooling.

Keywords: Shape restrictions, inference on functionals, conditional moment

(in)equality restrictions, instrumental variables, nonparametric and semiparametric

models, Banach space, Banach lattice, Koltchinskii coupling.

∗We thank Riccardo D’amato for excellent research assistance. We are also indebted to the editor,

three anonymous referees, and numerous seminar participants for their valuable comments.†Research supported by NSF Grant 1757140.‡Research supported by NSF Grant SES-1426882.

1 Introduction

Shape restrictions have played a central role in economics as both testable implications

of classical theory and sufficient conditions for obtaining informative counterfactual pre-

dictions (Topkis, 1998). A long tradition in applied and theoretical econometrics has as

a result studied shape restrictions, their ability to aid in identification, estimation, and

inference, and the possibility of testing for their validity (Matzkin, 1994; Chetverikov

et al., 2018). The canonical example of this interplay between theory and practice is

undoubtedly consumer demand analysis, where theoretical predictions such as Slutsky

conditions have been extensively tested for and employed in estimation (Hausman and

Newey, 1995, 2016; Blundell et al., 2012; Dette et al., 2016). The empirical analysis

of shape restrictions, however, goes well beyond this important application with recent

examples including studies into the monotonicity of the state price density (Jackwerth,

2000; Aıt-Sahalia and Duarte, 2003), the presence of ramp-up and start-up costs (Wolak,

2007; Reguant, 2014), and the existence of complementarities in demand (Gentzkow,

2007) and organizational design (Athey and Stern, 1998; Kretschmer et al., 2012).

In this paper, we provide a general procedure for inference under shape restrictions

in conditional moment restriction models in which the conditioning variables (i.e. in-

struments) may vary with equations. As shown by Ai and Chen (2007, 2012), the

models we study encompass parametric (Hansen, 1982), semiparametric (Ai and Chen,

2003), and nonparametric (Newey and Powell, 2003) specifications, as well as panel data

applications (Chamberlain, 1992) and the study of plug-in functionals. By incorporat-

ing nuisance parameters into the definition of the parameter space, these models also

encompass semi(non)-parametric conditional moment (in)equality models.

Shape restrictions are often equivalent to inequalities on parameters of interest and of

certain unknown functions. For example, Slutsky negative semi-definiteness and mono-

tonicity require that certain functions satisfy inequality restrictions. Inference with

inequality restrictions is difficult. Such restrictions lead to discontinuities in limiting

distributions when the inequality restrictions bind, which makes inference challenging

due to non-pivotal and potentially unreliable pointwise asymptotic approximations (An-

drews, 2000, 2001). We address these challenges by carefully building some constraint

slackness into a potentially regularized specification of the local parameter space that

accounts for the curvature present in nonlinear constraints.

Our test statistics and proposed inference methods are based on the minimums

of the generalized method of moments (GMM) objective function with and without

inequality restrictions. We obtain a strong approximation to our test statistics and

propose bootstrap based critical values. The resulting tests remain valid in partially

identified settings and are shown to asymptotically control size uniformly over a class of

underlying distributions of the data. Inference is potentially conservative but powerful

1

in exploiting the large amount of information that inequality restrictions can provide

in many cases relevant for applications. While aspects of our analysis are specific to

the conditional moment restriction model, the role of the local parameter space is solely

dictated by the shape restrictions. As such, we expect the insights of the set up here to

be applicable to the study of shape restrictions in alternative models as well.

The inequalities associated with nonparametric shape restrictions necessitate con-

sideration of parameter spaces that are sufficiently general yet endowed with enough

structure to ensure a fruitful asymptotic analysis. An important insight of this paper is

that this simultaneous flexibility and structure is possessed by sets defined by inequality

restrictions on Abstract M (AM) spaces, an AM space being a Banach lattice whose

norm obeys a condition discussed in Section 3. We illustrate the general applicability

of our framework by applying our main results to: (i) Conduct inference on parameters

of interest estimated by GMM in the presence of parametric inequality restrictions; (ii)

Test shape restrictions on structural functions satisfying quantile conditional moment

restrictions; (iii) Conduct inference about partially identified sets of average equivalent

variation and other objects of interest in demand estimation with general heterogeneity

and smooth demand functions; and (iv) Impose the Slutsky restrictions to conduct in-

ference in a linear conditional moment restriction model. Additionally, while we do not

pursue further examples in detail for conciseness, we note our results may be applied to

conduct tests of homogeneity, supermodularity, and economies of scale or scope, as well

as inference on functionals of the identified set in certain partially identified models.

The literature on nonparametric shape restrictions in econometrics has classically

focused on testing whether conditional mean regressions satisfy the restrictions implied

by consumer demand theory; see, e.g., Lewbel (1995) and Haag et al. (2009), as well

as Lee et al. (2018) for a generalization to estimators allowing for Bahadur representa-

tions. The related problem of studying monotone conditional mean regressions has also

garnered widespread attention – recent advances on this problem includes Chetverikov

(2019) and Chatterjee et al. (2013). Additional work concerning monotonicity con-

straints includes Beare and Schmidt (2014) who test the monotonicity of the pricing

kernel, Chetverikov and Wilhelm (2017) who study estimation of a nonparametric in-

strumental variable (IV) regression under monotonicity constraints, Armstrong (2015)

who develops minimax rate optimal one sided tests in a Gaussian regression discon-

tinuity (RD) design, Babii and Kumar (2019) who apply isotonic regression to RD,

and Freyberger and Horowitz (2015) who study a partially identified nonparametric IV

model with discrete regressors. Our results do not lend themselves computationally for

the construction of uniform confidence bands for shape restricted functions – a problem

that has been addressed in different contexts by Chernozhukov et al. (2009), Horowitz

and Lee (2017), and Freyberger and Reeves (2018). Following the original version of

this paper, Zhu (2019) and Fang and Seo (2019) have proposed inference methods for

2

convex restrictions which, while applicable to an important class of problems, rule out

inference on nonlinear functionals or tests of certain shape restrictions. The analysis in

our paper is also related to the moment inequalities literature (Canay and Shaikh, 2017;

Ho and Rosen, 2017). When specialized to these models, our results enable for subvec-

tor inference under conditions related to those in Bugni et al. (2017); see Torgovitsky

(2019) for an application of our procedure in this context. Our paper also contributes

to a literature studying semiparametric and nonparametric models under partial iden-

tification (Manski, 2003). Examples of such work include Chen et al. (2011a), Hong

(2017), Santos (2012), Tao (2014), and Chen et al. (2011b) who focus on inference on

functionals of the structural functional but do not allow for shape constraints as we do.

This paper innovates relative to the previous literature in providing novel bootstrap

based, uniformly valid inference methods for general conditional moment restriction

models, including parametric and nonparametric IV models as special cases. Most of the

previous results are either for conditional means or the inference results are not uniformly

valid and are thus unsuitable for confidence interval construction. We also innovate in

providing inference results for functionals of the structural function while allowing for

shape constraints. The results of Chernozhukov et al. (2009) are complementary to our

analysis since our results do not provide feasible uniform confidence intervals but do lead

to, e.g., tests for monotonicity and for inference at a point whereas their methods would

be conservative. The results here are highly complementary to Chetverikov and Wilhelm

(2017) in providing inference for nonparametric IV under shape restrictions while they

showed that imposing monotonicity can greatly improve the convergence rate of the

estimator – an observation that additionally motivates our use of test statistics based

on shape constrained (instead of unconstrained) estimators.

The remainder of the paper is organized as follows. In Section 2 we preview our test

in the context of two specific examples. Section 3 contains our main theoretical results,

while Section 4 applies them to conduct inference in the heterogenous demand model

of Hausman and Newey (2016). Finally, Section 5 contains a brief simulation study,

while Section 6 revisits Angrist and Krueger (1991) study into the returns of education

by imposing shape restrictions. All mathematical derivations are included in a series of

appendices; see in particular Appendix S.6 for applications of our general results and

Appendix S.8 for coupling results based on Koltchinskii (1994).

2 Guide to Implementation

To fix ideas, we first describe our test in the context of two specific examples. We reserve

until later the full mathematical framework and focus here on implementation.

3

2.1 Linear Instrumental Variables

As perhaps the simplest possible example, we first consider a linear instrumental variable

model in which θ0(P ) ∈ Θ is identified through the moment conditions

EP [(Y −W ′θ0(P ))Z] = 0,

where Y ∈ R, W ∈ Rdθ , Z ∈ Rdz , and P denotes the distribution of V ≡ (Y,W,Z).

We are interested in testing whether θ0(P ) belongs to a set R characterized by

R = θ ∈ Rdθ : Fθ = b, Cθ ≤ c, (1)

for F a db × dθ matrix, C a dc × dθ matrix, b ∈ Rdb , and c ∈ Rdc .

We consider tests based on minimizing the norm of the weighted sample moments

as in Sargan (1958) and Hansen (1982). To this end, we define the criterion

Qn(θ) ≡ ( 1

n

n∑i=1

(Yi −W ′iθZi))′Σ′nΣn(1

n

n∑i=1

(Yi −W ′iθ)Zi)1/2,

where Σn is consistent for (E[ZZ ′U2])−12 for U ≡ Y −W ′θ0(P ) – e.g. Σ′nΣn may be the

optimal weighting matrix of Hansen (1982). Our test statistic is then based on

In(R) = minθ∈R

√nQn(θ) In(Θ) = min

θ∈Rdθ

√nQn(θ).

Specifically, we consider tests that reject for large values of In(R)−In(Θ). In what follows

it will also be helpful to let θn and θun denote the minimizers ofQn over θ ∈ R and θ ∈ Rdθ

respectively – i.e. θn and θun are the constrained and unconstrained estimators.

We construct a critical value for this choice of test statistic by using the multiplier

bootstrap. Specifically, let b ∈ 1, . . . , B index a bootstrap draw, ωbini=1 be i.i.d.

independent of the data with ωbi ∼ N(0, 1), and for any θ ∈ Rdθ define

Wbn(θ) ≡ 1√

n

n∑i=1

ωbi(Yi −W ′iθ)Zi −1

n

n∑j=1

(Yj −W ′jθ)Zj,

which is a simulated draw of the true (centered) moment functions. We also require an

estimator of the derivative of the moment conditions, and to this end we set

Dn[h] ≡ − 1

n

n∑i=1

ZiW′ih

for any h ∈ Rdθ . Finally, we additionally need to account for the variation in θn (recall

In(R) =√nQn(θn)). The principal challenge in this regard is accounting for the possible

4

values that θn may take, and for this purpose we introduce the set

Vn(θn) ≡ h ∈ Rdθ : Fh = 0 and Cjh ≤√nmax0,−(rn + Cj θn − cj) for all j,

where Cj is the jth row of C, cj the jth coordinate of c, and rn is a slackness parameter

whose choice we discuss shortly. The set Vn(θn) can be thought of as a local version of

the restricted parameter space, approximating the set of values h/√n that could equal

θn − θ0(P ). Our bootstrap approximations to In(R) and In(Θ) are then given by

U bn(R) ≡ minh∈Vn(θn)

(Wbn(θn) + Dnh)′Σ′nΣn(Wb

n(θn) + Dnh)1/2 (2)

U bn(Θ) ≡ minh∈Rdθ

(Wbn(θu

n) + Dnh)′Σ′nΣn(Wbn(θu

n) + Dnh)1/2. (3)

Thus, for a level α test, we employ the 1 − α quantile of U bn(R) − U bn(Θ) across the B

bootstrap draws as our critical value. The asymptotic validity of this test is formally

established in Appendix S.6.1, where we specialize our results to parametric GMM.

The critical value depends on the choice rn. Our asymptotic theory will require

rn√n → ∞, with the choice rn = +∞ always being valid. Intuitively, rn is meant to

reflect the sampling uncertainty in Cθn − θ0(P ). Since the distribution of θn cannot

be uniformly consistently estimated, we suggest linking rn to the degree of sampling

uncertainty in Cθun − θ0(P ) instead. Specifically, we recommend setting rn to satisfy

P ( max1≤j≤J

Cjθun − θu?

n ≤ rn|Data) = 1− γn (4)

where γn → 0 and θu?n is a “bootstrap” analogue of θu

n. This approach changes the prob-

lem of selecting rn into the problem of selecting γn. However, γn is more interpretable:

If we employed Vn(θun) in place of Vn(θn) in (2), then a Bonferroni bound implies the

asymptotic size is bounded by α + γn for fixed γn.1 In simulations we find this bound

to be overly pessimistic and the test to have size at most α for a wide range of γn.

Remark 2.1. Our test may be employed to obtain confidence regions for a coordinate

of θ0(P ) while imposing restrictions of the form Cθ0(P ) ≤ c (e.g. sign restrictions or

monotonicity of x 7→ x′θ0(P )). Specifically, for θk the kth coordinate of θ ∈ Rdθ let

Rλ ≡ θ ∈ Rdθ : θk = λ, Cθ ≤ c,

which is a special case of (1). The desired confidence region can then be obtained by

conducting test inversion in λ employing the described test.

1While we may replace Vn(θn) with Vn(θun) in identified models, in partially identified models weemploy Vn(θn) due to the identified set potentially not being a subset of R under the null hypothesis.

5

2.2 Quantile Treatment Effects

As a second introductory example we consider a nonparametric conditional moment

restriction model. Specifically, for an outcome Y ∈ R, treatment D ∈ [0, 1], instrument

Z ∈ R, and quantile τ ∈ (0, 1), we are interested in the function θ0(P ) solving

P (Y ≤ θ0(P )(D)|Z) = τ, (5)

where θ0(P )(D) denotes the value of θ0(P ) at the point D and P the distribution of

V ≡ (Y,D,Z). If D is randomly assigned, then we may set D = Z and interpret the

derivative ∇θ0(P ) as the τ th quantile treatment effect (QTE). Alternatively, if D 6= Z,

then we obtain the IV model of QTE of Chernozhukov and Hansen (2005). In what

follows we assume θ0(P ) belongs to a parameter space Θ, such as the set of functions

with bounded first and second derivatives; see Appendix S.6.3 for a formal analysis.

We aim to obtain a confidence interval on the variation (in D) of the QTE while

imposing that the QTE be increasing in treatment intensity.2 To this end, we set

R ≡ θ ∈ Θ :

∫ 1

0(∇θ(u))2du− (

∫ 1

0∇θ(u)du)2 = λ and ∇2θ(u) ≥ 0 for all u ∈ [0, 1],

where λ is the hypothesized value for the variation of the QTE, and the positivity

constraint on ∇2θ ensures the QTE is increasing in treatment intensity. By conducting

test inversion (in λ) of the null hypothesis that θ0(P ) ∈ R we may then obtain the desired

confidence region. To construct a test statistic we employ a sequence of transformations

qkknk=1 of Z, let qkn(z) ≡ (q1(z), . . . , qkn(z))′, and define the criterion function

Qn(θ) ≡ ( 1

n

n∑i=1

(1Yi ≤ θ(Di) − τ)qknn (Zi))′(

1

n

n∑i=1

(1Yi ≤ θ(Di) − τ)qknn (Zi))1/2,

where we allow kn to increase with n to reflect the full content of the conditional moment

restriction in (5). Because Θ is infinite dimensional, we do not minimize Qn over Θ,

but instead employ a sieve Θn. Specifically, for pjjnj=1 a sequence of approximating

functions, pjn(d) ≡ (p1(d), . . . , pjn(d))′, and Θn ≡ pjn′β : pjn′β ∈ Θ we define

In(R) ≡ infθ∈Θn∩R

√nQn(θ) In(Θ) ≡ inf

θ∈Θn

√nQn(θ),

and let θn and θun denote the corresponding constrained and unconstrained estimators.

The statistics In(R) and In(R) − In(Θ) can both be employed as the basis for a test.

As we discuss in Section 3, however, our asymptotic approximation to In(R) − In(Θ)

imposes more stringent assumptions on the rate of convergence of θun than those needed to

2Inference on other functionals, such as the average (in D) QTE, is also feasible but for illustrativepurposes we focus here on a nonlinear functional such as the variance.

6

approximate In(R) – a distinction that may be relevant in this nonparametric application

(particularly if D 6= Z), but was unnecessary in the parametric model of Section 2.1.

To obtain a critical value we proceed in a conceptually similar manner to Section 2.1.

First, for b ∈ 1, . . . , B indexing bootstrap draws of ωbini=1 i.i.d. with ωi ∼ N(0, 1) let

Wbn(θ) ≡ 1√

n

n∑i=1

ωbi(1Yi ≤ θ(Di) − τ)qkn(Zi)−1

n

n∑j=1

(1Yj ≤ θ(Di) − τ)qkn(Zj),

be our multiplier bootstrap analogue to the centered moments. Second, since Qn is not

everywhere differentiable, we approximate the derivative of the moments by

Dn(θ)[pjn′β] ≡ 1√n

n∑i=1

qkn(Zi)(1Yi ≤ θ(Di) +pjn(Di)

′β√n

− 1Yi ≤ θ(Di)),

which is a numerical derivative with step size 1/√n. Third, to account for the possible

values that the restricted estimator θn may take we introduce the set

Vn(θn, `n) =β ∈ Rjn : ∇2pjn(d)′β ≥

√nmin0, rn +∇θn(d) for all d ∈ [0, 1],∫ 1

0(∇θn(u)+

∇pjn(u)′β√n

)2du−(

∫ 1

0∇(θn(u)+

pjn(u)′β√n

)du)2 = λ, ‖pjn′β‖2,∞ ≤√n`n

where the choices of rn and `n are discussed below and the norm ‖ · ‖2,∞ is defined as

‖pjn′β‖2,∞ = max0≤α≤2

supd∈[0,1]

|∇αpjn(d)′β|.

In analogy to Section 2.1, Vn(θn, `n) represents possible local deviations from the ap-

proximation to θ0(P ) in Θn. Our bootstrap approximations to In(R) and In(Θ) are

U bn(R|`n) ≡ infβ∈Vn(θn,`n)

(Wbn(θn) + Dn(θn)[pjnβ])′(Wb

n(θn) + Dn(θn)[pjnβ])1/2

U bn(Θ|`n) ≡ infβ:‖pjn′β‖2,∞≤

√n`n(Wb

n(θun) + Dn(θu

n)[pjnβ])′(Wbn(θu

n) + Dn(θun)[pjnβ])1/2.

Hence, a test based on In(R) rejects whenever In(R) exceeds the 1 − α quantile of

U bn(R|`n) across bootstrap replications, while a test based on In(R) − In(Θ) rejects

whenever In(R)− In(Θ) exceeds the 1− α quantile of U bn(R|`n)− U bn(Θ|`n).

Our critical values depend on the choices of rn and `n. The slackness parameter rn

again measures sampling uncertainty in whether constraints “bind” and we thus set

P ( maxd∈[0,1]

∇2θun(d)−∇2θu?

n (d) ≤ rn|Data) = 1− γn

for θu?n a “bootstrap” analogue to θu

n and γn → 0 as in Section 2.1. The band-

7

width `n regularizes the local parameter space, and its main role is to ensure that

`n × (rate of convergence) = o(n−1/2) in settings in which the rate of convergence is

slower than the usually required o(n−1/4). We thus suggest setting `n to satisfy

P ( maxd∈[0,1]

|∇2θun(d)−∇2θu?

n (d)| ≤ 1√n`n|Data) = 1− γn.

In settings in which the rate of convergence is expected to be sufficiently fast, no regu-

larization is necessary and one may set `n = +∞; see Lemma 3.1 below.

3 General Theory

We next turn to developing a general inferential framework that encompasses the tests

discussed in Section 2 as special cases. The class of models we consider are those in

which the parameter of interest θ0 ∈ Θ satisfies J conditional moment restrictions

EP [ρ(X, θ0)|Z] = 0 for 1 ≤ ≤ J

with ρ : Rdx × Θ → R, X ∈ Rdx , Z ∈ Rdz , V ≡ (X, ZJ=1), and V ∼ P ∈ P.

In some of the applications that motivate us, such as Hausman and Newey (2016), the

parameter θ0 is not identified. It is therefore convenient to define the identified set

Θ0(P ) ≡ θ ∈ Θ : EP [ρ(X, θ)|Z] = 0 for 1 ≤ ≤ J

and employ it as the basis of our statistical analysis. Formally, for a set R of parameters

satisfying a conjectured restriction, we develop a test for the hypothesis

H0 : Θ0(P ) ∩R 6= ∅ H1 : Θ0(P ) ∩R = ∅; (6)

i.e. we devise a test of whether at least one element of the identified set satisfies the

posited constraints. In an identified model, a test of (6) is thus equivalent to a test of

whether θ0 satisfies the hypothesized constraint. We denote the set of distributions P

satisfying the null hypothesis by P0 ≡ P ∈ P : Θ0(P ) ∩R 6= ∅.

The defining elements determining the generality of the hypothesis allowed for by

(6) are the choices of Θ and R. In imposing restrictions on both Θ and R we therefore

aim to allow for a general framework while simultaneously ensuring enough structure

for a fruitful asymptotic analysis. To this end, we require Θ to be a subset of a complete

normed vector space B (i.e. a Banach space) and consider sets R with the structure

R ≡ θ ∈ B : ΥF (θ) = 0 and ΥG(θ) ≤ 0, (7)

where ΥF : B→ F and ΥG : B→ G are known maps. Our first assumption formalizes

8

the restrictions that we impose on the parameter space Θ and the restriction set R.

Assumption 3.1. (i) Θ ⊆ B, where B is a Banach space with metric ‖ · ‖B; (ii)

ΥF : B→ F and ΥG : B→ G, where F is a Banach space with metric ‖ · ‖F and G is

an AM space with order unit 1G and metric ‖ · ‖G.

Assumption 3.1(i) allows us to address parametric, semiparametric, and nonpara-

metric models. Assumption 3.1(ii) similarly imposes that ΥF take values in a Banach

space F, while ΥG is required to take values in an AM space G – we provide an overview

of AM spaces in the supplemental appendix. Heuristically, the essential properties of G

are: (i) G is a vector space equipped with a partial order “≤”; (ii) The partial order and

the vector space operations interact in the same manner they do on R (e.g. if θ1 ≤ θ2,

then θ1 + θ3 ≤ θ2 + θ3); and (iii) The order unit 1G ∈ G is an element such that for any

θ ∈ G there exists a scalar λ > 0 satisfying |θ| ≤ λ1G (e.g. when G = Rd we may set

1G ≡ (1, . . . , 1)′ ∈ Rd). These properties of an AM space will prove instrumental in our

analysis. In particular, the order unit 1G will provide a crucial link between the partial

order “≤”, the norm ‖ · ‖G, and (through smoothness of ΥG) allow us to leverage a rate

of convergence in B to build a suitable sample analogue to the local parameter space.

3.1 The Test Statistic

We test the null hypothesis in (6) by employing sieve-GMM criterion based tests that

may be viewed as a generalizations of the J-test of Sargan (1958) and Hansen (1982) or

the incremental J-test of Eichenbaum et al. (1988).

For each instrument Z, we consider transformations qk,n,kn,k=1 and let q

kn,n, (z) ≡

(q1,n,(z), . . . , qkn,,n,(z))′. Further setting Z ≡ (Z ′1, . . . , Z

′J )′, kn ≡

∑J=1 kn,, q

knn (z) ≡

(qkn,1n,1 (z1)′, . . . , q

kn,Jn,J (zJ )′)′, and ρ(x, θ) ≡ (ρ1(x, θ), . . . , ρJ (x, θ))′ we then let

1√n

n∑i=1

ρ(Xi, θ) ∗ qknn (Zi) ≡1√n

n∑i=1

(ρ1(Xi, θ)qkn,1n,1 (Zi,1)′, . . . , ρJ (Xi, θ)q

kn,Jn,J (Zi,J )′)′;

i.e. for each θ we obtain the scaled sample averages of the product of each “residual”

ρ(X, θ) with the transformations of its respective instrument Z. For a possibly data

dependant kn × kn matrix Σn, we then construct a criterion function Qn : Θ→ R+ by

Qn(θ) ≡ ‖ 1

n

n∑i=1

ρ(Xi, θ) ∗ qknn (Zi)‖Σn,p,

where for any a ∈ Rkn we let ‖a‖Σn,p ≡ ‖Σna‖p and ‖ · ‖p be the p-norm on Rkn for any

p ≥ 2 – i.e. ‖a‖pp ≡∑d

i=1 |a(i)|p for any a ≡ (a(1), . . . , a(d))′ ∈ Rd. We note that setting

p = 2 leads to the computationally most attractive test. However, we allow for p 6= 2 as

higher values of p enables us to establish coupling results under weaker conditions.

9

Heuristically,√nQn should diverge to infinity when evaluated at any θ /∈ Θ0(P )

and remain “stable” when evaluated at a θ ∈ Θ0(P ). Thus, examining the minimum

of√nQn over R should reveal whether there is a θ that simultaneously makes

√nQn

“stable” (θ ∈ Θ0(P )) and satisfies the conjectured restriction (θ ∈ R). This intuition

leads to a generalization of the J-test that is based on the test statistic


√nQn(θ),

where Θn ∩R is a finite dimensional subset of Θ ∩R that grows dense in Θ ∩R (Chen,

2007). Alternatively, we may recenter our test statistic by considering

In(R)− In(Θ).

Intuitively, relative to In(R), tests based on In(R)−In(Θ) can have higher power against

alternatives that satisfy Θ0(P ) 6= ∅ but lower power against alternatives under which

Θ0(P ) = ∅ (Chen and Santos, 2018). The theoretical study of both statistics, however,

is similar. Specifically, we will obtain coupling results that apply to statistics In(R) and

suitable bootstrap counterparts for general sets R. As a result, they will also apply to

R = Θ, which immediately lets us analyze the recentered statistic as well.

Before proceeding, we introduce notation and assumptions we employ throughout

our analysis. For any function f of V we let Gn,P f ≡∑n

i=1f(Vi) − EP [f(V )]/√n

denote the empirical process evaluated at f . We will often evaluate Gn,P on the class

Fn ≡ ρ(·, θ) : θ ∈ Θn ∩R and 1 ≤ ≤ J. (8)

The “size” of Fn plays a crucial role, and we control it through the bracketing integral

J[ ](δ,Fn, ‖ · ‖P,2) ≡∫ δ

0

√1 + logN[ ](ε,Fn, ‖ · ‖P,2)dε, (9)

where N[ ](ε,Fn, ‖ · ‖P,2) is the smallest number of ε-brackets (under ‖ · ‖P,2) required to

cover Fn. It will also prove useful to denote the vector subspace generated by Θn ∩ Rby Bn. If there are multiple norms ‖ · ‖A1 and ‖ · ‖A2 on Bn, we further set

Sn(A1,A2) ≡ supb∈Bn

‖b‖A1

‖b‖A2

, (10)

which we note depends on the sample size n only through the choice of sieve Θn ∩R.

The following assumptions introduce the basic structure we employ.

Assumption 3.2. (i) Vi∞i=1 is an i.i.d. sequence with Vi ∼ P ∈ P.

Assumption 3.3. (i) sup1≤≤J sup1≤k≤kn, ‖qk,n,‖∞ ≤ Bn with Bn ≥ 1; (ii) The eigen-

values of EP [qkn,n, (Z)q

kn,n, (Z)

′] are bounded in , n, P ∈ P; (iii) Fn has envelope Fn,

10

supP∈P ‖Fn‖P,2 <∞, and supP∈P J[ ](‖Fn‖P,2,Fn, ‖ · ‖P,2) ≤ Jn for some Jn <∞.

Assumption 3.4. (i) For each P ∈ P there is a Σn(P ) > 0 with ‖Σn − Σn(P )‖o,p =

oP (1) uniformly in P ∈ P; (ii) The matrices Σn(P ) are invertible for all n and P ∈ P;

(iii) ‖Σn(P )‖o,p and ‖Σn(P )−1‖o,p are uniformly bounded in n and P ∈ P.

Assumption 3.2 imposes that the sample Vini=1 be i.i.d. with P belonging to a set of

distributions P over which our results will hold uniformly. By allowing Bn to depend on

n, Assumption 3.3(i) accommodates both transformations that are uniformly bounded

in n, such as trigonometric series, and those with diverging bound, such as normalized

B-splines. The bound on eigenvalues imposed in Assumption 3.3(ii) guarantees that

qk,n,kn,n=1 are Bessel sequences uniformly in n, while Assumption 3.3(iii) controls the

“size” of the class Fn, which is crucial in studying the induced empirical process. We

note that Jn is allowed to diverge with the sample size and thus Assumption 3.3(iii)

accommodates non-compact parameter spaces Θ as in Chen and Pouzo (2012, 2015).

Alternatively, if the class F ≡⋃∞n=1Fn is restricted to be Donsker, then Assumptions

3.3(ii)-(iii) can hold with uniformly bounded Jn and ‖Fn‖P,2. Finally, Assumption 3.4

requires the weighting matrix Σn to converge to an invertible matrix Σn(P ) – here,

‖ · ‖o,p denotes the operator norm when Rkn is endowed with ‖ · ‖p.

3.2 Strong Approximation

We begin our analysis by obtaining a strong approximation to statistics of the form

In(R). The results immediately apply to In(R)− In(Θ) as well by setting R = Θ.

3.2.1 Rate of Convergence

As a preliminary step, we first aim to characterize the rate of convergence of the (ap-

proximate) minimizers of Qn on Θn ∩R. Formally, for any sequence τn ↓ 0, we let

Θrn ≡ θ ∈ Θn ∩R : Qn(θ) ≤ inf

θ∈Θn∩RQn(θ) + τn, (11)

which constitutes the set of exact (τn = 0) or near (τn > 0) minimizers of Qn. We

consider the general case with τn ↓ 0, as we employ both the set of exact and near

minimizers in characterizing and estimating the distribution of our test statistics.

Following the literature on set estimation (Chernozhukov et al., 2007; Beresteanu

and Molinari, 2008; Santos, 2011; Kaido and Santos, 2014), for any sets A and B we let

−→d H(A,B, ‖ · ‖E) ≡ sup

a∈Ainfb∈B‖a− b‖E

dH(A,B, ‖ · ‖E) ≡ max−→d H(A,B, ‖ · ‖E),

−→d H(B,A, ‖ · ‖E),

11

which constitute the directed Hausdorff distance and the Hausdorff distance under a

norm ‖ · ‖E that need not equal ‖ · ‖B. Heuristically, in order to obtain a strong approx-

imation we require a convergence rate under a metric for which the empirical process is

equicontinuous – a purpose for which ‖ · ‖B is often too strong with its use leading to

overly stringent assumptions. By introducing a potentially weaker norm ‖ · ‖E we are

thus able to obtain better rate conditions in some infinite dimensional problems.

For each θ ∈ Θ∩R, we further let Πrnθ denote its approximation on Θn ∩R and set

Θr0n(P ) ≡ Πr

nθ : θ ∈ Θ0(P ) ∩R. (12)

It will also prove useful to define the population analogue to Qn(θ), which is given by

Qn,P (θ) ≡ ‖EP [ρ(X, θ) ∗ qknn (Z)]‖Σn(P ),p.

Our next set of assumptions suffices for deriving a rate of convergence for Θrn.

Assumption 3.5. For each P ∈ P0 there is a map Πrn : Θ ∩ R → Θn ∩ R such that

supP∈P0supθ∈Θ0(P )∩RQn,P (Πr

nθ)−infθ∈Θn∩RQn,P (θ) = O(JnBnk1/pn

√log(1 + kn)/n).

Assumption 3.6. (i) There is a norm ‖ ·‖E on Bn, sets Vn(P ) ⊆ Θn∩R, and νn∞n=1

with ν−1n = O(1), such that Θr

n ⊆ Vn(P ) with probability tending to one uniformly in

P ∈ P0 and for any θ ∈ Vn(P ) and ηn ≡ JnBnk1/pn

√log(1 + kn)/n it follows

ν−1n

−→d H(θ,Θr

0n(P ), ‖ · ‖E) ≤ Qn,P (θ)− supθ0∈Θr

0n(P )Qn,P (θ0)+O(ηn);

(ii) Σn satisfies supθ∈Θr0n(P )Qn,P (θ)× ‖Σn − Σn(P )‖o,p = OP (ηn) uniformly in P0.

Assumption 3.5 formalizes the sense in which Θ0(P ) ∩ R must be approximated by

Θr0n(P ). Since Θ0(P )∩R minimizes Qn,P over the entire parameter space, Assumption

3.5 requires that Θr0n(P ) be “close” to minimizing Qn,P over the sieve. Assumption

3.6(i) introduces a local identification condition by requiring that on a set Vn(P ), the

criterion Qn,P grow at a rate ν−1n as θ moves away from Θr

0n(P ). The parameter ν−1n ,

which implicitly depends on kn and the choice of sieve Θn ∩ R, is conceptually related

to sieve measure of ill-posedness (Blundell et al., 2007). The set Vn(P ) may be taken to

equal the entire sieve in convex models, or it may be taken to equal a local neighborhood

of Θr0n(P ) after establishing the consistency of Θr

n; see Lemma S.3.1 in the supplemental

appendix. Finally, Assumption 3.6(ii) imposes a rate of convergence requirement on Σn.

The next theorem employs Assumptions 3.5 and 3.6 to obtain a rate of convergence.

Theorem 3.1. Let Assumptions 3.2, 3.3(i), 3.3(iii), 3.4, 3.5, and 3.6 hold, and let

Rn ≡ νnk

1/pn

√log(1 + kn)JnBn√

n. (13)

12

Then uniformly in P ∈ P0: (i)−→d H(Θr

n,Θr0n(P ), ‖ · ‖E) = OP (Rn + νnτn); and (ii)

dH(Θrn,Θ

r0n(P ), ‖ · ‖E) = OP (νnτn) provided JnBnk

1/pn

√log(1 + kn)/n = o(τn).

Theorem 3.1(i) implies that, with arbitrarily high probability, Θrn(P ) is contained

in a neighborhood of Θr0n(P ) that shrinks at a Rn + νnτn rate. We further note that

in identified models Θr0n(P ) is a singleton and Theorem 3.1(i) reduces to consistency in

the Hausdorff metric. For partially identified models, however, Theorem 3.1(ii) further

requires τn not tend to zero too fast in order to obtain Hausdorff consistency. It is also

worth noting that, under assumptions on the Hausdorff distance between Θr0n(P ) and

Θ0(P )∩R, Theorem 3.1 and the triangle inequality can yield a rate of convergence of Θn

to Θ0(P )∩R. Heuristically, we focus on convergence to Θr0n(P ) (instead of Θ0(P )∩R)

because our strong approximation will rely on undersmoothing.

While we employ Theorem 3.1 in our forthcoming analysis, we emphasize that in

specific applications alternative results that are better suited for the particular structure

of the model may be available. In this regard, we note Assumptions 3.5 and 3.6 are not

needed in our analysis beyond their role in delivering a rate of convergence. In particular,

if an alternative rate is derived under different assumptions, then such a result can still be

combined with our analysis to establish the validity of the proposed inferential methods.

3.2.2 Local Approximation

We follow the literature by observing that it is possible to accommodate non-differentiable

moment functions by requiring their conditional moments to be differentiable (Chen and

Pouzo, 2009, 2012). To this end, for any 1 ≤ ≤ J we define mP, : B→ L2P by

mP,(θ)(Z) ≡ EP [ρ(X, θ)|Z].

Our strong approximation then relies on the following assumptions.

Assumption 3.7. supθ∈Θn∩R ‖Gn,Pρ(·, θ)∗qknn −Wn,Pρ(·, θ)∗qknn ‖p = oP (an) uniformly

in P ∈ P for Wn,P Gaussian and an∞n=1 some known bounded sequence.

Assumption 3.8. There are κρ > 0 and Kρ <∞ such that for all θ1, θ2 ∈ Θn ∩R and

P ∈ P it follows that EP [‖ρ(X, θ1)− ρ(X, θ2)‖22] ≤ K2ρ‖θ1 − θ2‖

2κρE .

Assumption 3.9. There is a norm ‖ · ‖L on Bn, linear maps ∇mP,(θ) : B → L2P ,

and constants ε > 0 and Km,Mm < ∞ such that for all P ∈ P, h ∈ Bn, and elements

θ0, θ1 ∈ θ ∈ Θn ∩ R :−→d H(θ,Θr

0n(P ), ‖ · ‖E) ≤ ε: (i) ‖mP,(θ1) − mP,(θ0) −∇mP,(θ0)[θ1−θ0]‖P,2 ≤ Km‖θ1−θ0‖L‖θ1−θ0‖E; (ii) ‖∇mP,(θ1)[h]−∇mP,(θ0)[h]‖P,2 ≤Km‖θ1 − θ0‖L‖h‖E; and (iii) ‖∇mP,(θ0)[h]‖P,2 ≤Mm‖h‖E.

13

Assumption 3.10. (i) k1/pn

√log(1 + kn)Bn supP∈P J[ ](R

κρn ,Fn, ‖ · ‖P,2) = o(an); (ii)

supP∈P0supθ0∈Θr

0n(P )

√n‖EP [ρ(X, θ0)∗qknn (Z)]‖Σn(P ),p = o(an); (iii) ‖Σn−Σn(P )‖o,p =

oP (ank1/pn

√log(1 + kn)BnJn−1) uniformly in P ∈ P0.

Assumption 3.7 requires that the empirical process be coupled to a Gaussian process

uniformly in P ∈ P. The sequence an∞n=1 denotes a bound on the rate of convergence

of the coupling, which in turn characterizes the rate of convergence of our strong ap-

proximation. In the supplemental appendix, we illustrate how to verify Assumption 3.7

by relying on available coupling results (Yurinskii, 1977; Zhai, 2018) or an extension of

Koltchinskii (1994) derived in Appendix S.8. Assumption 3.8 imposes a mild restric-

tion on the moment functions that ensures Wn,P is appropriately equicontinuous with

respect to ‖ · ‖E. Assumption 3.9(i) demands that the conditional moment functions

mP, be locally approximated by linear maps ∇mP, with an approximation error that

is controlled by the product of ‖ · ‖E and a potentially stronger norm ‖ · ‖L. In turn,

Assumptions 3.9(ii)(iii) impose continuity conditions on ∇mP, – these assumptions are

not required to obtain a strong approximation but are needed for our inference results.

Finally, Assumption 3.10 contains our rate restrictions: Assumption 3.10(i) ensures the

rate of convergence Rn is sufficiently fast to overcome an asymptotic loss of equiconti-

nuity; Assumption 3.10(ii) is needed for the test statistic to be properly centered under

the null hypothesis; and Assumption 3.10(iii) controls the rate of convergence of Σn.

We obtain a strong approximation by employing a local reparametrization. Formally,

for any θ ∈ Θn ∩R we denote the collection of local deviations from θ by

Vn(θ, `) ≡ h√n∈ Bn : θ +

h√n∈ Θn ∩R and ‖ h√

n‖E ≤ `. (14)

The normalization by√n is notationally convenient but otherwise plays no particular

role since ` may depend on n. In addition, for any θ ∈ Θr0n(P ) and h ∈ Bn we let

Dn,P (θ0)[h] ≡ EP [∇mP (θ0)[h](Z) ∗ qknn (Z)],

where ∇mP (θ0)[h](Z) ≡ (∇mP,1(θ0)[h](Z1), . . . ,∇mP,J (θ0)[h](ZJ ))′. For any given

sequence `n, we then define a sequence of random variables Un,P (R|`n) to be given by

Un,P (R|`n) ≡ infθ0∈Θr

0n(P )inf

h√n∈Vn(θ0,`n)

‖Wn,Pρ(·, θ0) ∗ qknn + Dn,P (θ0)[h]‖Σn(P ),p. (15)

The next result establishes the relation between Un,P (R|`n) and our test statistic.

It is helpful to recall here that the notation Sn(L,E) was introduced in (10) and that

the constant Km, introduced in Assumption 3.9, equals zero for linear models.

Theorem 3.2. Let Assumptions 3.2, 3.3, 3.4, 3.6, 3.7, 3.8, 3.9(i), and 3.10 hold. Then:

(i) For any `n ↓ 0 satisfying k1/pn

√log(1 + kn)Bn × supP∈P J[ ](`

κρn ,Fn, ‖ · ‖P,2) = o(an)

14

and Km`2n × Sn(L,E) = o(ann

− 12 ) it follows uniformly in P ∈ P0 that:

In(R) ≤ Un,P (R|`n) + oP (an).

(ii) If in addition KmR2n×Sn(L,E) = o(ann

− 12 ), then `n may be additionally chosen to

satisfy Rn = o(`n), in which case it follows uniformly in P ∈ P0 that:

In(R) = Un,P (R|`n) + oP (an).

Theorem 3.2 is perhaps best understood as establishing the validity of a family

(indexed by `n) of strong approximations that differ on the size of the neighborhoods

of Θr0n(P ) that they employ. Its proof crucially relies on the local approximation

√nEP [ρ(X, θ0 +

h√n

) ∗ qknn (Z)] ≈√nEP [ρ(X, θ0) ∗ qknn (Z)] + Dn,P (θ0)[h], (16)

which is valid for nonlinear moments (Km 6= 0) whenever h/√n lies in a sufficiently

small neighborhood of θ0 – i.e. h/√n ∈ Vn(θ0, `n) for `n sufficiently small. By Theorem

3.1, the minimizer of Qn is with high probability contained in a Rn-shrinking neigh-

borhood of Θr0n(P ) and as a result has the structure θ0 + h/

√n for some (θ0, h/

√n) ∈

(Θr0n(P ), Vn(θ0, `n)) and appropriately selected `n. In particular whenever Rn tends to

zero sufficiently fast (or moments are linear), the minimizer of Qn will be contained in

a neighborhood for which (16) holds, which delivers the conclusion of Theorem 3.2(ii).

Regrettably, in certain nonlinear models the rate of convergence may be too slow and

the minimizer of Qn fail to be contained in neighborhoods for which the approximation

in (16) is accurate. In such instances, we may instead rely on set inclusion to obtain

In(R) = infθ∈Θn∩R

√nQn(θ) ≤ inf

(θ0,h√n

)∈(Θr0n(P ),Vn(θ0,`n))

√nQn(θ0 +

h√n

) (17)

and successfully couple the right hand side of (17) by restricting attention to sequences

`n for which (16) is accurate. Thus, by regularizing the local parameter space through

a norm bound, we obtain in Theorem 3.2(i) a distributional approximation that, while

potentially conservative, holds under weaker requirements on the rate of convergence.

3.3 Bootstrap Approximation

Theorem 3.2 provides us with a family of distributions (indexed by `n) whose quantiles

may be employed as critical values for our test. We next focus on devising a bootstrap

procedure for estimating the distribution of Un,P (R|`n) for appropriate `n.

15

3.3.1 Bootstrap Statistic

We estimate the distribution of Un,P (R|`n) by replacing population parameters with

suitable sample analogues. The key ingredients are: (i) A random variable Wn whose

distribution conditional on the data is consistent for the distribution of Wn,P ; (ii) An

estimator Dn(θ) for Dn,P (θ); and (iii) A sample analogue Vn(θ, `n) for the local parameter

space Vn(θ, `n). We then approximate the distribution of Un,P (R|`n) by that of

Un(R|`n) ≡ infθ∈Θr

n

infh√n∈Vn(θ,`n)

‖Wnρ(·, θ) ∗ qknn + Dn(θ)[h]‖Σn,p (18)

conditional on the data, where recall Θrn is the set estimator introduced in (11).

For concreteness, we employ the following sample analogues in our construction.

Gaussian Distribution: We estimate the distribution of the Gaussian process Wn,P

by relying on the multiplier bootstrap (Ledoux and Talagrand, 1988). Specifically, for

an i.i.d. sample ωini=1 with ωi ∼ N(0, 1) and independent of Vini=1 we set

Wnf ≡1√n

n∑i=1

ωif(Vi)−1

n

n∑j=1

f(Vj) (19)

for any function f . Since ωini=1 is independent of Vini=1, it follows that conditionally

on Vini=1, the distribution of Wn is centered Gaussian and its covariance kernel equals

the sample analogue to the unknown covariance kernel of Wn,P .

The Derivative: We estimate Dn,P (θ) by employing a construction that is applicable

to non-differentiable moments. Specifically, for any θ ∈ Θn ∩R and h ∈ Bn we set

Dn(θ)[h] ≡ 1√n

n∑i=1

(ρ(Xi, θ +h√n

)− ρ(Xi, θ)) ∗ qknn (Zi). (20)

While we focus our theoretical study on Dn(θ) due to its general applicability, alternative

approaches may be preferable in models with differentiable moments. In particular,

under differentiability, it is computationally preferable to employ

1

n

n∑i=1

∇θρ(Xi, θ)[h] ∗ qknn (Zi),

for ∇θρ(x, θ)[h] ≡ ∂∂τ ρ(x, θ + τh)|τ=0 as an estimator for Dn,P (θ)[h].

Local Parameter Space: In order to account for the role inequality constraints play

in determining the local parameter space, we conservatively estimate “binding” sets in

analogy to approaches pursued in the partial identification literature; see, e.g, Cher-

nozhukov et al. (2007); Galichon and Henry (2009); Linton et al. (2010); Andrews and

16

Soares (2010). Specifically, for a sequence rn and any θ ∈ Θn ∩R we define

Gn(θ) ≡ h√n∈ Bn : ΥG(θ +

h√n

) ≤ (ΥG(θ)−Kgrn‖h√n‖B1G) ∨ (−rn1G), (21)

where recall 1G is the order unit in G and g1 ∨ g2 represents the supremum of any

g1, g2 ∈ G. The constant Kg, formally introduced in Assumption 3.11 below, is related

to the curvature of ΥG and equals zero for linear ΥG. For any `n we then define

Vn(θ, `n) ≡ h√n∈ Bn :

h√n∈ Gn(θ), ΥF (θ +

h√n

) = 0 and ‖ h√n‖B ≤ `n, (22)

i.e. in comparison to Vn(θ, `n) we: (i) Replace ΥG(θ+h/√n) ≤ 0 by h/

√n ∈ Gn(θ); (ii)

Retain ΥF (θ + h/√n) = 0; and (iii) Substitute ‖h/

√n‖E ≤ `n with ‖h/

√n‖B ≤ `n.

3.3.2 Local Parameter Space

The construction of a suitable sample analogue for the local parameter space is arguably

the most novel aspect of the proposed bootstrap procedure. We next discuss the key

assumptions that ensure its validity and provide intuition for our construction.

In what follows, we will impose assumptions on ε-neighborhoods of Θr0n(P ) denoted

(Θr0n(P ))ε ≡ θ ∈ Bn :

−→d H(θ,Θr

0n(P ), ‖ · ‖B) ≤ ε.

In addition, we denote the closure of the linear span of ΥF (Bn) by Fn, and for any

linear map Γ on B we let N (Γ) ≡ h ∈ B : Γ(h) = 0 denote its null space. Given these

definitions, we next impose the following requirements on the maps ΥF and ΥG.

Assumption 3.11. For some Kg,Mg <∞, ε > 0 and all n, P ∈ P0, θ1, θ2 ∈ (Θr0n(P ))ε

(i) ΥG is Frechet differentiable with ‖ΥG(θ1)−ΥG(θ2)−∇ΥG(θ1)[θ1−θ2]‖G ≤ Kg‖θ1−θ2‖2B; (ii) ‖∇ΥG(θ1)−∇ΥG(θ0)‖o ≤ Kg‖θ1 − θ0‖B; (iii) ‖∇ΥG(θ1)‖o ≤Mg.

Assumption 3.12. For some Kf ,Mf <∞, ε > 0 and all n, P ∈ P0, θ1, θ2 ∈ (Θr0n(P ))ε

(i) ΥF is Frechet differentiable with ‖ΥF (θ1)−ΥF (θ2)−∇ΥF (θ1)[θ1−θ2]‖F ≤ Kf‖θ1−θ2‖2B; (ii) ‖∇ΥF (θ1) − ∇ΥF (θ0)‖o ≤ Kf‖θ1 − θ0‖B; (iii) ‖∇ΥF (θ1)‖o ≤ Mf ; (iv)

∇ΥF (θ1) : Bn → Fn admits a right inverse ∇ΥF (θ1)− with Kf‖∇ΥF (θ1)−‖o ≤Mf .

Assumption 3.13. Either (i) ΥF : B → F is affine, or (ii) There are constants

ε > 0, Kd < ∞ such that for every P ∈ P0, n, and θ0 ∈ Θr0n(P ) there exists a

h0 ∈ Bn ∩N (∇ΥF (θ0)) satisfying ΥG(θ0) +∇ΥG(θ0)[h0] ≤ −ε1G and ‖h0‖B ≤ Kd.

Assumption 3.11 imposes that ΥG be Frechet differentiable. The constant Kg, em-

ployed in the construction of Vn(θ, `), may be interpreted as a bound on the second

17

θ0

Vn(θ0,+∞)

θn

Gn(θn)

Figure 1: Impact of inequality constraints for restriction θ0 ≤ 0. Left Panel: True local

parameter space in shaded region. Right Panel: Approximation Gn(θn) in shaded region.

derivative of ΥG and equals zero when ΥG is linear. Assumptions 3.12 and 3.13 mark

an important difference between hypotheses in which ΥF is linear and those in which

ΥF is nonlinear – note linear ΥF automatically satisfy Assumptions 3.12 and 3.13. This

distinction reflects that when ΥF is linear its impact on the local parameter space is

known and need not be estimated.3 Thus, while Assumptions 3.12(i)-(iii) impose condi-

tions analogous to those required of ΥG, Assumption 3.12(iv) additionally demands that

∇ΥF (θ) posses a norm bounded right inverse on (Θr0n(P ))ε – the existence of a right in-

verse is equivalent to a classical rank condition.4 Finally, for nonlinear ΥF , Assumption

3.13(ii) requires the existence of a local perturbation to any θ0 ∈ Θr0n(P ) that relaxes

“active” inequality constraints without a first order effect on the equality restrictions.

Figure 1 illustrates how we account for inequality constraints in the case in which

B is the set of continuous functions on R and we aim to test whether θ0(x) ≤ 0 for all

x ∈ R. Absent equality constraints, the local parameter space for θ0 consists of the set

of perturbations h/√n such that θ0 + h/

√n remains negative. For an estimator θn of

θ0, Gn(θn) then consists of the h/√n such that at any x either (i) θn(x) + h(x)/

√n is

not “too close” to zero (to account for sampling uncertainty) or (ii) h(x)/√n is negative

(since all negative h/√n belong to the local parameter space for θ0). As θn converges

to θ0, Gn(θn) is asymptotically contained in the local parameter space of θ0. Unlike

Figure 1, whenever ΥG is nonlinear we must further account for the curvature of ΥG

which motivates the presence of the term Kgrn‖h/√n‖B1G in (21).

Figure 2 illustrates how regularizing with a ‖·‖B-norm constraint allows us to account

3For linear ΥF , the requirement ΥF (θ + h/√n) = 0 is equivalent to ΥF (h) = 0 for any θ ∈ R.

4Recall for a linear map Γ : Bn → Fn, its right inverse is a map Γ− : Fn → Bn such that ΓΓ−(h) = hfor any h ∈ Bn. The right inverse Γ− need not be unique if Γ is not bijective, in which case Assumption3.12(iv) is satisfied as long as it holds for some right inverse of ∇ΥF (θ) : Bn → Fn.

18

θ : ΥF (θ) = 0

θ0

θ2

θ1

Vn(θ0,+∞)

Vn(θ2,+∞)

Vn(θ1,+∞)

Figure 2: Impact of equality constraints. Left panel: Set of θ satisfying nonlinear equality

restriction. Right panel: Corresponding local parameter spaces (absent inequality constraints).

for the impact of equality constraints on the local parameter space when B = R2 and

F = R. In this setting, the constraint ΥF (θ) = 0 and the local parameter space

Vn(θ,+∞) generate curves in R2. Since all curves Vn(θ,+∞) pass through zero, they

are all “close” in a neighborhood of the origin. However, for nonlinear ΥF the size of

the neighborhood of the origin in which Vn(θn,+∞) is “close” to Vn(θ0,+∞) crucially

depends on both the distance of θn to θ0 and the curvature of ΥF . The set Vn(θn, `n)

therefore accounts for the impact of equality constraints by restricting attention to the

expanding neighborhood of the origin in which Vn(θn,+∞) resembles Vn(θ0,+∞).

Remark 3.1. Whenever ΥF and ΥG are linear, controlling the norm ‖ · ‖B is no longer

necessary as the “curvature” of ΥF and ΥG is known. As a result, we may instead set

Vn(θ, `n) ≡ h√n∈ Bn :


h√n

) = 0 and ‖ h√n‖E ≤ `n. (23)

Controlling the norm ‖ · ‖E, however, may still be necessary to ensure that Dn(θ)[h] is

a consistent estimator for Dn,P (θ)[h] uniformly in h/√n ∈ Vn(θ, `n).

3.3.3 Bootstrap Coupling

We impose a final set of assumptions in order to couple our bootstrap statistic.

Assumption 3.14. (i) supθ∈Θn∩R ‖Wnρ(·, θ) ∗ qknn −W?n,Pρ(·, θ) ∗ qknn ‖p = oP (an) uni-

formly in Φ×P with P ∈ P for W?n,P an isonormal Gaussian process that is independent

of Vini=1 and Φ the standard normal distribution.

Assumption 3.15. (i) For some Kb <∞, ‖h‖E ≤ Kb‖h‖B for all h ∈ Bn; (ii) There

is an ε > 0 such that P ((Θrn)ε ⊆ Θn) tends to one uniformly in P ∈ P0.

19

Assumption 3.16. (i) Either ΥF and ΥG are affine or (Rn+νnτn)×Sn(B,E) = o(1);

(ii) The sequences `n, τn satisfy k1/pn

√log(1 + kn)Bn× supP∈P J[ ](`

κρn ∨ (νnτn)κρ ,Fn, ‖ ·

‖P,2) = o(an), Km`n(`n+Rn+νnτn)×Sn(L,E) = o(ann− 1

2 ), and `n(`n+Rn+νnτn×Sn(B,E))1Kf > 0 = o(ann

− 12 ); (iii) The sequence rn satisfies lim supn→∞ 1Kg >

0`n/rn < 1/2 and (Rn + νnτn)× Sn(B,E) = o(rn).

Assumption 3.14 demands that Wn be coupled with a Gaussian W?n,P independent of

Vini=1. This condition implies the multiplier bootstrap is valid in our potentially non-

Donsker setting; see Appendix S.9 for sufficient conditions. Assumption 3.15(i) ensures

that ‖ · ‖B is (weakly) stronger than ‖ · ‖E. Assumption 3.15(ii) demands that Θrn be

asymptotically contained in the interior of Θn. We emphasize this requirement does not

rule out that parameter space restrictions be binding at Θr0n(P ) – instead, Assumption

3.15(ii) requires that all such restrictions be explicitly stated through R. Assumption

3.16(i) ensures the one sided Hausdorff convergence of Θrn to Θr

0n(P ) under ‖ · ‖B when-

ever ΥF or ΥG are nonlinear. The main conditions on the sequence `n, employed in

constructing Vn(θ, `n), are contained in Assumption 3.16(ii). These conditions ensure

the consistency of Dn(θ)[h], the applicability of Theorem 3.2, and that the intuition

behind Figure 2 be valid – recall the latter two are automatic when Km = Kf = 0,

which is reflected in Assumption 3.16(ii). Finally, Assumption 3.16(iii) requires that rn

not decrease to zero faster than the ‖ · ‖B-rate of convergence of Θrn.

Our next result provides a coupling result for our bootstrap statistic. In its state-

ment, U?n,P (R|`n) is defined identically to Un,P (R|`n) but with W?n,P in place of Wn,P .

Theorem 3.3. If Assumptions 3.1, 3.2, 3.3, 3.4, 3.6, 3.7, 3.8, 3.9, 3.10(i), 3.10(iii),

3.11, 3.12, 3.13, 3.14, 3.15, 3.16 hold, then there is ñ `n so that uniformly in P ∈ P0

Un(R|`n) ≥ U?n,P (R|ñ) + oP (an).

Theorem 3.3 shows that with unconditional probability tending to one uniformly on

P ∈ P0 our bootstrap statistic is bounded from below by a random variable that is

independent of the data. The significance of this result lies in that the lower bound is

equal in distribution to the coupling to In(R) obtained in Theorem 3.2. Thus, Theorems

3.2 and 3.3 provide the basis for establishing that comparing In(R) to the quantiles of

Un(R|`n) conditional on the data provides asymptotic size control.

We conclude by showing that it is sometimes possible to set `n = +∞.

Lemma 3.1. Suppose there is a An(P ) ⊆ Θn ∩ R such that ‖h‖E ≤ νn‖Dn,P (θ)[h]‖pfor all θ ∈ An(P ) and h ∈

√nBn ∩R− θ. If the estimator Dn(θ) satisfies

supθ∈An(P )

suph∈√nBn∩R−θ:‖ h√

n‖B≥`n

‖Dn(θ)[h]− Dn,P (θ)[h]‖p‖h‖E

= oP (ν−1n ) (24)

20

and Θrn ⊆ An(P ) with probability tending to one uniformly in P ∈ P0, Assumptions

3.3(i)(iii), 3.4, 3.14 hold, and Sn(B,E)Rn = o(`n), then uniformly in P ∈ P0

Un(R|`n) = infθ∈Θr

n

infh√n∈Vn(θ,+∞)

‖Wnρ(·, θ) ∗ qknn + Dn(θ)[h]‖Σn,p + oP (an). (25)

Heuristically, Lemma 3.1 establishes the constraint ‖h/√n‖B ≤ `n is asymptotically

not binding provided `n ↓ 0 sufficiently slowly. In order for `n to simultaneous satisfy

such a requirement and Assumption 3.16(ii)-(iii), it must be that either the rate of

convergence is adequately fast, or that both the moments and ΥF are linear. Thus,

setting `n <∞ can be necessary in nonlinear problems with slow rates of convergence.

3.4 The Test

3.4.1 Critical Value: In(R)

For any statistic Tn that is a function of the data Vini=1, we let qτ,P (Tn) denote its τ

quantile when Vi ∼ P – e.g. the 1− α quantile of In(R) is denoted by

q1−α,P (In(R)) ≡ infu : P (In(R) ≤ u) ≥ 1− α.

Similarly, given any statistic Tn that is a function of Vini=1 and the bootstrap weights

ωini=1, we let qτ (Tn) denote its conditional τ quantile given Vini=1. In particular,

Theorems 3.2 and 3.3 suggest that a valid critical value for In(R) is given by

q1−α(Un(R|`n)) ≡ infu : P (Un(R|`n) ≤ u|Vini=1) ≥ 1− α.

We additionally impose one final assumption.

Assumption 3.17. There is a δ > 0 such that for all ε > 0 and α ∈ [α − δ, α + δ] we

have supP∈P0P (q1−α,P (In(R)) − ε ≤ In(R) ≤ q1−α,P (In(R)) + ε) ≤ %n(ε ∧ 1) + o(1),

where the concentration parameter %n is smaller than the coupling rate (i.e. %n ≤ a−1n ).

It is well known that uniform consistent estimation of an approximating distribution

is not sufficient for establishing asymptotic size control (Romano and Shaikh, 2012).

Intuitively, in order to achieve size control the approximate distribution must be suitably

continuous at the quantile of interest uniformly in P ∈ P0. Assumption 3.17 imposes

precisely this requirement, allowing the modulus of continuity, captured here by the

concentration parameter %n, to deteriorate with the sample size provided that %n ≤ a−1n

– i.e. the loss of continuity must occur at a slower rate than the coupling rate of Theorems

21

3.2 and 3.3. We refer the reader to Chernozhukov et al. (2013, 2014) for further discussion

and motivation of conditions of this type, called anti-concentration conditions.5

The next result establishes the asymptotic validity of a test based on In(R).

Corollary 3.1. Let Assumption 3.17 hold and the conditions of Theorem 3.2(i) and

Theorem 3.3 be satisfied. If cn = q1−α(Un(R|`n)), then it follows that:

lim supn→∞

supP∈P0

P (In(R) > cn) ≤ α.

We note that if there are no inequality constraints, then it is possible to show that

the test in Corollary 3.1 is similar and its asymptotic size equals the nominal level α

whenever the conditions of Theorem 3.2(ii) are satisfied. The consistency of the test

against any P ∈ P \P0 for which max ‖EP [ρ(X, θ)|Z]‖P,2 is bounded away from zero

(in θ ∈ Θ ∩R) is also straightforward to establish under suitable conditions.

3.4.2 Critical Value: In(R)− In(Θ)

We next establish the asymptotic validity of a test based on In(R) − In(Θ). To this

end, we will apply Theorems 3.2 and 3.3 to both R as in (7) and R = Θ, which in turn

will require us to impose assumptions for both cases. Throughout, we therefore signify

parameters associated with setting R = Θ by a “u” superscript – e.g. Fun , Θu

0n(P ), and

V un (θ, `) are understood to be as in (8), (12), and (14) but with R = Θ. Employing this

notation, we then note that the analogue to Un,P (R|`n) (as in (15)) is given by

Un,P (Θ|`n) ≡ infθ0∈Θu

0n(P )inf

h√n∈V u

n (θ0,`n)‖Wn,Pρ(·, θ0) ∗ qknn + Dn,P (θ0)[h]‖Σn(P ),p.

Our next result establishes a strong approximation to the recentered statistic.

Corollary 3.2. Let the conditions of Theorem 3.2(i) be satisfied with R as in (7) and the

conditions of Theorem 3.2(ii) be satisfied with R = Θ. Then, for any `n, ùn ↓ 0 satisfying

k1/pn


κρn ,Fn, ‖ · ‖P,2) + supP∈P J[ ]((`

un)κρ ,Fu

n , ‖ · ‖P,2) =

o(an), Km(ùn)2 × Sun(L,E) = o(an), and Ru

n = o(ùn) it follows uniformly in P ∈ P0

In(R)− In(Θ) ≤ Un,P (R|`n)− Un,P (Θ|ùn) + oP (an).

Corollary 3.2 follows immediately by applying Theorem 3.2(i) to couple In(R) and

Theorem 3.2(ii) to couple In(Θ). In particular, it is worth emphasizing that in coupling

In(Θ) we must rely on Theorem 3.2(ii) instead of Theorem 3.2(i) in order to ensure that

5Alternatively, Assumption 3.17 can be dispensed by adding a fixed constant η > 0 to the criticalvalue – i.e. using q1−α(Un(R|`n)) + η as the critical value (Andrews and Shi, 2013).

22

In(R) − In(Θ) is weakly smaller than its strong approximation. As a result, whenever

moments are nonlinear, Corollary 3.2 requires the rate of convergence of the uncon-

strained estimator to be sufficiently fast for Theorem 3.2(ii) to apply.

In order to obtain a bootstrap approximation, we define the following estimators

Θun ≡ θ ∈ Θn : Qn(θ) ≤ inf

θ∈ΘnQn(θ) + τu

n V un (`) ≡ h√

n∈ Bu

n : ‖ h√n‖E ≤ `

i.e. Θun is simply the set estimator of Section 3.2.1 applied with Θ = R, while V u

n (`) is

the local parameter space sample analogue of Section 3.3.2 applied with no equality or

inequality constraints. As a bootstrap approximation to Un,P (Θ|ùn) we then employ

Un(Θ|ùn) ≡ infθ∈Θu

n

infh√n∈V u

n (ùn)‖Wnρ(·, θ) ∗ qknn + Dn(θ)[h]‖Σn,p.

The next corollary obtains a coupling for our bootstrap approximation.

Corollary 3.3. Let the conditions of Theorem 3.3 be satisfied with R as in (7) and

R = Θ, and suppose JunBnk

1/pn

√log(1 + kn)/n = o(τu

n ). Then, it follows that there areñ `n and ˜u

n ùn such that uniformly in P ∈ P0 we have

Un(R|`n)− Un(Θ|ùn) ≥ U?n,P (R|ñ)− U?n,P (Θ|˜un) + oP (an).

Corollary 3.3 simply follows by applying Theorem 3.3 to Un(R|`n) and coupling

Un(Θ|ùn) to U?n,P (Θ|˜un). For the latter coupling, it is important that Θu

n be consistent

for Θu0n(P ) in the Hausdorff metric instead of the directed Hausdorff metric. While both

metrics are equivalent for identified models, and hence we may set τun = 0, in partially

identified models we require that τun tend to zero sufficiently slowly as in Theorem 3.1(ii).

We further note that in identified models, it is possible to employ either Wnρ(·, θn) or

Wnρ(·, θun) in constructing both Un(R|`n) and Un(Θ|ùn) – a change that results in an

asymptotically equivalent coupling but ensures that the bootstrap statistic is positive.

The asymptotic validity of a test that rejects whenever In(R) − In(Θ) exceeds the

appropriate quantile of our bootstrap approximation immediately follows. In addition,

we note that it is always possible to set ùn = +∞ – a choice that does not lead to a loss

of power under conditions related to those stated in Lemma 3.1.

Corollary 3.4. Let Assumption 3.17 hold with In(R)− In(Θ) instead of In(R), and the

conditions of Corollaries 3.2, 3.3 be satisfied. If cn = q1−α(Un(R|`n)− Un(Θ|ùn)), then:

lim supn→∞

supP∈P0

P (In(R)− In(Θ) > cn) ≤ α.

Moreover, the same conclusion holds if we instead set cn = q1−α(Un(R|`n)−Un(Θ|+∞)).

23

4 Heterogeneity and Demand Analysis

For our final example, we illustrate how to conduct inference in the heterogeneous de-

mand model of Hausman and Newey (2016). Specifically, for Y ∈ [0, 1] equal to the

expenditure share on a commodity, W ∈ W ⊆ Rdw a vector of prices, income, and

covariates, and η representing unobserved individual heterogeneity we suppose

Y = g(W, η) (26)

where g is a known function of (W, η). The requirement that g be known is not neces-

sarily restrictive as η can in principle be infinite dimensional. For instance, Hausman

and Newey (2016) show that if expenditure share is actually given by Y = g(W, ε) for

some unknown function g and unobserved ε, then under appropriate conditions

Y = g(W, ε) =∞∑j=1

ψj(W )βj(ε), (27)

where ψj∞j=1 is a known basis with∑∞

j=1 ψ2j (W ) finite and βj(ε)∞j=1 are unknown

coefficients. Setting η = βj(ε)∞j=1 and viewing η as a random variable in the sequence

space `2 ≡ aj∞j=1 :∑

j a2j <∞, we then note (27) reduces to (26) with g known.

If the covariates W are independent of η, then for any c ∈ R it follows that

P (Y ≤ c|W ) = P (g(W, η) ≤ c|W ) =

∫1g(W, η) ≤ cνP (dη) (28)

where νP denotes the unknown distribution of η. Result (28) restricts the possible values

of νP and hence the identified set for average exact consumer surplus, average share, and

other functionals of interest. Specifically, for Ψ(g, η) an object of interest for preferences

denoted by η, such as equivalent variation, Hausman and Newey (2016) study∫Ψ(g, η)νP (dη), (29)

which is the average across individuals. By evaluating the set of values of (29) which

can be generated by a distribution νP satisfying (28) at a grid cJ=1, Hausman and

Newey (2016) provide estimates of the identified set for (29). We further note bounds

on the distribution of Ψ(g, η) under νP can be obtained by replacing Ψ(g, η) in (29) with

an indicator that Ψ(g, η) be less than or equal to some real number.

In what follows, we apply our results to conduct inference on functionals as in

(29). To this end, we let FP (c|W ) ≡ P (Y ≤ c|W ) for a given grid cJ=1 and

set θ0(P ) = (FP (c|W )J=1, νP ). To define B, we suppose η ∈ Ω for some known

Hausdorff space Ω, set B to be the Borel σ-algebra on Ω, let M(B) be the space of

regular signed Borel measures on Ω, and let ‖ · ‖TV denote the total variation norm.

24

Assuming FP (c|·) ∈ CB(W) for CB(W) the space of continuous and bounded functions

on W, we define B ≡ (⊗J

=1CB(W))×M(B) and for any (F (c|·)J=1, ν) = θ ∈ B let

‖θ‖B =∑J

=1 ‖F (c|·)‖∞ + ‖ν‖TV . As the parameter space Θ we then set

Θ ≡ (F (c|·)J=1, ν) = θ ∈ B : max1≤≤J

‖F (c|·)‖∞ ≤ 2, (30)

where the “2” norm bound is simply selected to ensure θ0(P ) is in the interior of Θ.

Letting X ≡ (Y,W ) and Z = W for every 1 ≤ ≤ J we then define

ρ(X, θ) = 1Y ≤ c − F (c|W ), (31)

which yields conditional moment restrictions that identify FP (c|W ) – νP , however, is

potentially partially identified. Also let G = `∞(B), which is an AM space under ‖ · ‖∞with order unit 1G satisfying 1G(B) = 1 for any B ∈ B. Defining ΥG : B→ `∞(B) by

ΥG(θ)(B) = −ν(B) (32)

for any set B ∈ B and (F (c|·)J=1, ν) = θ ∈ B, then enables us to impose the restriction

that ν be a positive measure through the inequality ΥG(θ) ≤ 0. Finally, for a grid

wlLl=1 we let ΥF : B→ RJL+2 be given by ΥF (θ) = (Υ(e)F (θ),Υ

(ν)F (θ),Υ

(s)F (θ)), where

Υ(e)F (θ) = F (c|wl)−

∫1g(wl, η) ≤ cν(dη)1≤≤J ,1≤l≤L (33)

Υ(ν)F (θ) = ν(Ω)− 1 (34)

Υ(s)F (θ) =

∫Ψ(g, η)ν(dη)− λ; (35)

i.e. (33) reflects (28) holds, (34) demands that ν have measure one, and (35) lets us

impose that average object of interest be equal to a hypothesized value λ. By conducting

test inversion in λ we can obtain a confidence region for the functional in (29).

As in Hausman and Newey (2016), we can impose utility maximization by requiring

that Ω consist only of η such that g(·, η) satisfies the Slutsky conditions. One may

sample from Ω by drawing randomly from sets of η that satisfy Slutsky symmetry and

only keeping those where the compensated price effects matrix is negative semidefinite

on a grid. This is the procedure followed in Hausman and Newey (2016) for two goods.

Given a collection of orthogonal probability measures µs,nsns=1 ⊂M(B) we employ

Mn(B) ≡ ν ∈M(B) : ν =

sn∑s=1

πsµs,n for some πssns=1 ∈ Rsn

as a sieve for M(B). To guarantee that each consumer is rational we focus on discrete

distributions where each µs,n is a Dirac measure for a point in Ω satisfying the Slutsky

25

conditions. For Dirac µs,n it is also particularly (computationally) simple to impose the

constraints ΥG(θ) ≤ 0 (see (32)). As a sieve for FP (c|·)J=1, we employ approximating

functions pj,njnj=1. In particular, setting pjnn (w) = (p1,n(w), . . . , pjn,n(w))′, we let

Θn ≡ (pjn′n βJ=1, ν) : ν ∈Mn(B) and max1≤≤J

‖pjn′n β‖∞ ≤ 2. (36)

Similarly, for a sequence qn,kknk=1 and kn × kn positive definite matrices Σ,nJ=1, we

set qknn (w) ≡ (q1,n(w), . . . , qkn,n(w))′ and for any (F (c|·)J=1, ν) = θ define

Qn(θ) ≡ J∑=1

‖ 1

n

n∑i=1

(1Yi ≤ c − F (c|Wi))qknn (Wi)‖2Σ,n,2

1/2. (37)

The statistics In(R) and In(Θ) then equal the minimums of√nQn over Θn∩R and Θn.

Our next set of assumptions enable us to couple In(R) and In(R)− In(Θ).

Assumption 4.1. (i) Yi,Wini=1 is i.i.d. P with P ∈ P; (ii) supw ‖pjnn (w)‖2 .

√jn;

(iii) EP [pjnn (W )pjnn (W )′] has eigenvalues bounded away from zero and infinity uniformly

in P ∈ P and n; (iv) For each P ∈ P0 and θ0 ∈ Θ0(P ) ∩ R, there exists a Πrnθ0 =

(Fn(c|·)J=1, νn) ∈ Θn ∩ R such that∑J

=1 ‖EP [(Fn(c|W ) − FP (c|W ))qknn (W )]‖2 =

O((n log(n))−1/2) uniformly in P ∈ P0 and θ0 ∈ Θ0(P ) ∩R.

Assumption 4.2. (i) max1≤k≤kn ‖qk,n‖∞ .√kn; (ii) EP [qknn (W )qknn (W )′] has eigen-

values bounded uniformly in P ∈ P and n; (iii) EP [qknn (W )pjnn (W )′] has singular values

bounded away from zero in P ∈ P and n; (iv) k2nj

3/2n log3(n) = o(n1/2).

Assumption 4.3. For all 1 ≤ ≤ J : (i) ‖Σ,n − Σ,n(P )‖o,2 = oP (1/kn√jn log2(n))

uniformly in P ∈ P; (ii) Σ,n(P ) is invertible and ‖Σ,n(P )‖o,2 and ‖Σ,n(P )−1‖o,2 are

bounded uniformly in P ∈ P and n.

Assumptions 4.1(ii)-(iv) state the conditions on Θn, with Assumptions 4.1(ii)(iii) be-

ing satisfied by standard choices such as B-Splines or wavelets. Assumption 4.1(iv) is an

asymptotic unbiasedness requirement – a condition that is eased by noting no require-

ments are imposed on the approximating space for νP . The requirements on qk,nknk=1

are stated in Assumption 4.2(i)(iii) and are again satisfied by standard choices. Assump-

tion 4.2(iv) states a rate condition that suffices for verifying the coupling requirements

of Theorem 3.2. Assumption 4.3 imposes the requirements on the weighting matrices.

Our next result employs Theorem 3.2(ii) to obtain strong approximations.

Theorem 4.1. Let Assumptions 4.1, 4.2, 4.3 hold, an = (log(n))−1/2, and for any

θ = (F (c|·)J=1, ν) ∈ B let ‖θ‖E = max1≤≤J ‖F (c|·)‖∞. If `n, ùn ↓ 0 satisfy

26

kn√jn log2(n)(`n ∨ ùn) = o(1), knjn log(n)/

√n = o(`n ∧ ùn), then uniformly in P ∈ P0:

In(R) = Un,P (R|`n) + oP (an)

In(R)− In(Θ) = Un,P (R|`n)− Un,P (Θ|ùn) + oP (an).

In order to conduct inference, we next aim to estimate the distributions of Un,P (R|`n)

and Un,P (Θ|ùn). To this end, we note that Θr0n(P ) is potentially non-singleton and

we therefore employ a set estimator Θrn (as in (11)) to estimate the distribution of

Un,P (R|`n). In contrast, since Un,P (Θ|ùn) only depends on θ0(P ) through its identified

component FP (c|·)J=1, for the unconstrained problem we employ any minimizer θun of

Qn on Θn. With regards to the local parameter space, we note that in this application

Gn(θ) = (pjn′n β,hJ=1, νh) : νh(B) ≥ minrn − ν(B), 0 for all B ∈ B (38)

for any θ = (F (c|·)J=1, ν). Computationally, since any ν, νh ∈ Mn(B) has the struc-

ture ν =∑sn

s=1 πsµs,n and νh =∑sn

s=1 πshµs,n it follows that the constraints in (38)

reduce to πsh ≥ minrn − πs, 0 for all 1 ≤ s ≤ sn whenever µs,nsns=1 are orthogonal,.

Furthermore, since moments and restrictions are linear, we may let `n = +∞ and set

Vn(θ,+∞) =

(pjn′n β,hJ=1, νh) :h√n∈ Gn(θ), ΥF (h) = 0

. (39)

Thus, the estimators of the strong approximations obtained in Theorem 4.1 equal

Un(R|+∞) ≡ infθ∈Θr

n


J∑=1

‖Wnρ(·, θ)qknn + D,n[h]‖Σ,n,21/2

Un(Θ|+∞) ≡ infh√n

J∑=1

‖Wnρ(·, θun)qknn + D,n[h]‖Σ,n,2

1/2.

Before stating our final assumption, we need an auxiliary result. To this end, define

Γn(θ) ≡ ν ∈Mn(B) : θ = (F (c|·)J=1, ν) satisfies ΥF (θ) = 0, ΥG(θ) ≤ 0 (40)

for any (F (c|·)J=1, ν) = θ ∈ Θn ∩ R – i.e. Γn(θ) denotes the set of distributions of

unobserved heterogeneity that agree with the restrictions implied by F (c|·)J=1. The

next result bounds the ‖ · ‖TV -Hausdorff distance between Γn(θ1) and Γn(θ2).

Lemma 4.1. If the probability measures µs,nsns=1 are orthogonal and sn > JL, then

for every n there is a ζn <∞ independent of P such that

dH(Γn(θ1),Γn(θ2), ‖ · ‖TV ) ≤ ζnJ∑=1

‖F1(c|·)− F2(c|·)‖∞

27

for any (F1(c|·)J=1, ν1) = θ1 ∈ Θn ∩R and (F2(c|·)J=1, ν2) = θ2 ∈ Θn ∩R.

We next introduce a final assumption to show the validity of our bootstrap procedure.

Assumption 4.4. (i) Ψ(g, ·) is bounded on Ω; (ii) The measures µs,nsns=1 are or-

thogonal and sn > JL; (iii) k4nj

5n log5(n)/n = o(1); (iv) Πnθ0(P ) = (Fn(c|·)J=1, ν)

satisfies ‖Fn(c|·)−FP (c|·)‖∞ = o(1) uniformly in θ0(P ) ∈ Θ0(P )∩R and P ∈ P0; (v)

knjn log2(n)τn = o(1), and ζn(knjn log(n)/√n+√jnτn) = o(rn).

The boundedness of Ψ(g, ·) on Ω ensures Υ(s)F (as in (35)) is continuous, while As-

sumption 4.4(ii) allows us to apply Lemma 4.1. Assumption 4.4(iii) is a low level suf-

ficient condition for verifying the bootstrap coupling requirement of Assumption 3.14.

These rate requirements could be improved under smoothness conditions on FP (c|·).Finally, Assumption 4.4(iv) imposes a mild requirement on the sieve, while Assumption

4.4(v) states conditions on τn and rn – note τn = 0 and rn = +∞ are always valid.

Our final result obtains a coupling for our bootstrap approximations.

Theorem 4.2. Let the conditions of Theorem 4.1 hold and Assumption 4.4 be satis-

fied. Then: for any sequences `n, ùn ↓ 0 satisfying knjn log(n)/

√n = o(`n ∧ ùn) and

kn√jn log2(n)(`n ∨ ùn) = o(1) it follows uniformly in P ∈ P0 that

Un(R|+∞) ≥ U?n,P (R|`n) + oP (an)

Un(R|+∞)− Un(Θ|+∞) ≥ U?n,P (R|`n)− U?n,P (Θ|ùn) + oP (an).

In particular, since the conditions on `n and ùn imposed in Theorems 4.1 and 4.2 are

the same, it follows that we may employ the quantiles of Un(R|+∞) and Un(R|+∞)−Un(Θ|+∞) conditional on the data as critical values for In(R) and In(R)− In(Θ).

5 Simulation Evidence

We next study the finite sample performance of our inference procedure by revisiting

the simulation design in Chetverikov and Wilhelm (2017).

5.1 Identified Model

We first consider a nonparametric instrumental variable model in which, for some un-

known function θ0, the distribution of (Y,W,Z) ∈ R3 satisfies the restriction

Y = θ0(W ) + ε E[ε|Z] = 0; (41)

28

see Appendix S.6 for a formal study of this model. Following Chetverikov and Wilhelm

(2017), we set θ0(x) ≡ 0.2x+ x2 and for (ε, ζ, ν) independent standard normal random

variables we let Z = Φ(ζ), W = Φ(0.3ζ+√

1− (0.3)2ε), and ε = (0.3ε+√

1− (0.3)2ν)/2

for Φ the cumulative distribution function of a standard normal. All reported results

are based on five thousand replications employing five hundred bootstrap draws each.

Our results enable us to conduct inference on functionals of θ0 while imposing shape

restrictions. In what follows we set ΥF to denote either the level or the derivative of θ0 at

the point w0 = 0.5 and employ ΥG to impose that θ be either monotonically increasing

or monotonically increasing and convex. We employ the test statistic In(R) − In(Θ)

with r = 2 and Σn an estimate of the optimal weighting matrix based on a first stage

unconstrained estimator. The implementation of the test is similar to that of the linear

model of Section 2.1, with the difference that we must select the sieve Θn = pjn′n β : β ∈Rjn and qknn . We follow Chetverikov and Wilhelm (2017) in employing continuously

differentiable piecewise quadratic splines with equally spaced knots for both pjnn and qknn .

In computing critical values we set `n = ùn = +∞ since the model is linear and

τn = 0 since the model is identified – note these choices are also valid under partial

identification. We select rn by proceeding as in Section 2.1. For pjn′n βun the minimizer of

In(Θ) and pjn′n βu?n its score bootstrap analogue (Kline and Santos, 2012), we set

P ( max1≤j≤J

Cjβu?′n − βn ≤ rn|Vini=1) = 1− γn (42)

where γn ∈ (0, 1) and the vectors Cj ∈ Rjn depend on the shape restriction being

imposed. We emphasize that the sequence γn must tend to zero in order for rn to satisfy

our assumptions. Finally, we employ the minimizer of In(R) in obtaining bootstrap

draws for both Un(R|+∞) and Un(Θ|+∞); see discussion following Corollary 3.3.

Table 1 reports empirical rejection probabilities under the null hypothesis for 5%-

level tests on the derivative and level of θ0 at w0 = 0.5 under different shape restrictions.

Simulations for additional values of (jn, kn) are included in the supplemental appendix.

With regards to rn, we examine the extreme possible values (0 and ∞) and choices

corresponding to (42) for different γn. Across all simulations, including those in the

supplemental appendix, we find that the rejection probability is below the nominal level

provided 1−γn ≥ 0.5 in (42). Overall, we find the general lack of sensitivity to different

choices of bandwidths to be reassuring for empirical practice.

In Figure 3 we report power curves for different 5%-level tests concerning the value

of θ0 and its derivative at w0 = 0.5. For conciseness, we focus on the sample sizes

n ∈ 1000, 5000 and rn chosen as in (42) with 1 − γn = 0.95. The curves labeled

“Mon” and “Mon+Conv” correspond to tests based on In(R)− In(Θ) with R imposing

29

Imposed: Mon. Imposed: Mon.+ Conv.Level Derivative Level Derivative

rn/(jn, kn) (4,4) (4,6) (4,4) (4,6) (4,4) (4,6) (4,4) (4,6)∞ 1.90 1.72 1.88 2.02 1.44 1.52 2.74 2.84

95% 1.74 1.68 1.90 2.08 1.46 1.54 2.68 2.84n = 500 50% 1.74 1.70 1.90 2.10 1.46 1.54 2.68 2.84

5% 2.18 2.90 2.20 2.96 1.52 1.82 2.74 2.980 5.30 5.10 4.62 4.48 5.42 5.36 5.08 4.84∞ 1.56 1.82 1.68 1.94 1.40 1.54 2.26 2.32

95% 1.52 1.84 1.64 1.86 1.36 1.44 2.04 2.26n = 1000 50% 1.52 1.86 1.64 1.86 1.36 1.44 2.04 2.26

5% 2.02 2.84 2.06 3.06 1.44 1.86 2.14 2.380 4.54 4.56 4.58 4.68 4.62 4.78 4.38 4.20∞ 1.34 1.58 1.26 1.52 1.04 1.36 1.36 1.58

95% 1.40 1.50 1.32 1.62 1.06 1.42 1.36 1.62n = 5000 50% 1.42 1.52 1.32 1.62 1.06 1.42 1.36 1.62

5% 2.20 3.62 2.36 3.36 1.42 2.38 1.46 1.860 3.98 4.56 4.68 4.50 4.10 4.74 3.98 4.06

Table 1: Empirical rejection probabilities for 5%-level tests based on In(R) − In(Θ).Value of rn set to a percentile corresponds to choice of 1− γn in (42).

monotonicity and monotonicity and convexity while changing the conjectured value

of θ0 and its derivative at w0 = 0.5. The curve labeled “Unres.” corresponds to a

Wald test based on the unrestricted estimator. For all designs we find that imposing

shape restrictions can yield important power gains. The benefits of imposing shape

restrictions, however, depend on both the sampling uncertainty and how “close” the

shape restrictions are to binding (Chetverikov et al., 2018). Since our design is fixed with

n and θ0 is strictly increasing and convex, in our simulations we see the advantages of

imposing shape restrictions decrease with n as sample uncertainty decreases. Similarly,

since estimating the derivative is a harder than estimating the level, we observe larger

power gains when imposing shape restrictions in the former problem.

5.2 Partially Identified Model

We next examine the performance of our test in a partially identified setting by dis-

cretizing the simulation design in Chetverikov and Wilhelm (2017). Concretely, we

generate (W,Z, ε) ∈ [0, 1]2 × R as in Section 5.1, divide [0, 1] into Sw and Sz equally

spaced segments, and generate dummy variables Dw and Dz for the segment to which

W and Z belong – e.g. if (Sw, Sz) = (3, 2), then Dw(W ) ≡ (1W ∈ [0, 1/3], 1W ∈(1/3, 2/3], 1W ∈ (2/3, 1])′ and Dz(Z) ≡ (1Z ∈ [0, 1/2], 1Z ∈ (1/2, 1])′. The

outcome Y is generated according to (41) but employing Dw in place of W .

The discretized design is characterized by Sz linear unconditional moment restric-

tions in Sw unknowns. For conciseness, we focus on imposing that θ0 be monotoni-

cally increasing and convex while conducting inference on the value of θ0 at the point

30

−0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.10

0.2

0.4

0.6

0.8

1

Power Curve for Level: (jn, kn) = (4, 4), n = 1000

Mon Mon+Conv Unres.

−0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.10

0.2

0.4

0.6

0.8

1


Mon Mon+Conv Unres.

−0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.10

0.2

0.4

0.6

0.8

1


Mon Mon+Conv Unres.

−0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.10

0.2

0.4

0.6

0.8

1


Mon Mon+Conv Unres.

0 0.5 1 1.5 2 2.5 3 3.5 40

0.2

0.4

0.6

0.8

1

Power Curve for Derivative: (jn, kn) = (4, 4), n = 1000

Mon Mon+Conv Unres.

0 0.5 1 1.5 2 2.5 3 3.5 40

0.2

0.4

0.6

0.8

1


Mon Mon+Conv Unres.

0 0.5 1 1.5 2 2.5 3 3.5 40

0.2

0.4

0.6

0.8

1


Mon Mon+Conv Unres.

0 0.5 1 1.5 2 2.5 3 3.5 40

0.2

0.4

0.6

0.8

1


Mon Mon+Conv Unres.

Figure 3: Rejection probabilities for 5%-level tests on conjectured value of θ0(0.5) (truevalue 0.35) and θ′0(0.5) (true value 1.2). Tests implemented with 1− γn = 0.05 in (42).

d0 ≡ Dw(0.5) – e.g, if Sw = 3, then d0 = (0, 1, 0)′. We note that θ0(d0) is generically not

identified whenever Sw > Sz, but imposing a shape restriction on θ0 partially identifies

θ0(d0) in this design. Table 2 reports the relevant identified sets.

(Sw, Sz)Restriction on θ0 (3, 2) (4, 2) (3, 2)

Mon.+Convex [0.059, 0.252] [0.100, 0.412] [0.310, 0.388]No Restriction (−∞,+∞) (−∞,+∞) (−∞,+∞)

Table 2: Identified sets for θ0(d0) with and without shape restrictions.

31

Lower Endpoint Midpoint Upper Endpoint(Sw, Sz) (Sw, Sz) (Sw, Sz)

rn (3,2) (4,2) (4,3) (3,2) (4,2) (4,3) (3,2) (4,2) (4,3)∞ 1.96 3.34 1.48 0.10 0.02 1.48 1.88 3.10 2.00

95% 3.64 4.70 1.46 0.10 0.02 1.46 2.26 3.12 1.98n = 500 50% 5.34 5.24 1.46 0.50 0.06 1.50 5.22 5.02 2.04

5% 5.36 5.24 3.56 0.50 0.06 3.44 5.24 5.02 3.540 5.34 5.26 4.64 0.50 0.06 4.48 5.24 5.16 4.60∞ 1.84 3.06 1.12 0.00 0.00 1.10 1.96 2.90 1.34

95% 4.98 4.84 1.12 0.02 0.00 1.08 2.98 2.90 1.34n = 1000 50% 5.10 4.88 1.20 0.12 0.00 1.14 5.00 4.86 1.44

5% 5.10 4.88 3.48 0.12 0.00 3.12 5.00 4.86 2.780 5.28 4.88 4.42 0.08 0.00 4.14 5.10 4.86 3.82∞ 1.98 4.40 1.34 0.00 0.00 1.22 1.98 2.80 1.36

95% 5.08 6.76 1.34 0.00 0.00 1.26 4.56 4.86 1.34n = 5000 50% 5.08 8.30 1.48 0.00 0.00 1.44 4.58 4.84 1.52

5% 5.08 9.00 4.28 0.00 0.00 4.14 4.58 4.84 3.580 4.96 8.84 4.70 0.00 0.00 4.38 4.64 5.02 4.46

Table 3: Empirical rejection probabilities for 5%-level tests based on In(R) for differentpoints in the null hypothesis. Lower and upper endpoints correspond to Table 2.

We test whether a value c belongs to the identified set for θ0(d0) by setting ΥF (θ) =

θ(d0)− c and employing ΥG to impose that θ be monotonically increasing and convex.

We base inference on In(R) with r = 2, Σn the sample analogue to E[DzD′z], all moment

restrictions (kn = Sz), and a saturated model for θ0 (jn = Sw). To compute critical

values we set `n = +∞ and τn = 0 – though note Θrn need not be a singleton when τn = 0

because jn > kn. We select rn by modifying the approach employed in Section 5.1: For

θLn and θUn minimizers and maximizers of θ(d0) over the set of θ that are monotonically

increasing, convex, and minimize ‖∑n

i=1(Yi − θ(Dw,i))Dz,i/n‖∞, we set

P ( max1≤j≤J

maxΥG,j(θL?n − θLn ),ΥG,j(θ

U?n − θUn ) ≤ rn|Vini=1) = 1− γn, (43)

where ΥG,j(θ) denotes the jth coordinate of the vector ΥG(θ) and θL?n and θU?n are again

computed employing the score bootstrap. As in our previous analysis, γn must tend to

zero with n in order for rn to satisfy our assumptions.

Table 3 reports empirical rejection rates for testing whether c belongs to the identified

set, with the lower and upper endpoint columns corresponding to setting c to equal

the lower and upper endpoints in Table 2. All tests are conducted at a 5% nominal

level. Across designs, we find that setting rn = +∞ always delivers tests with rejection

probabilities below their nominal level. Setting rn according to (43) with 1− γn = 0.95

also delivers adequate size control, with the exception of n = 5000 and (Sw, Sz) = (4, 2)

where we see a modest over-rejection at the lower endpoint of the identified set. Overall,

the degree of sensitivity to the choice of rn is similar to that found in Section 5.1.

32

6 Returns to Education

As a final illustration of the empirical content of our results, we conclude by examining

the role that shape restrictions can play in the study of Angrist and Krueger (1991) on

returns to education. Specifically, for Y log-earnings and D years of education we posit

Y = g(D) +W ′γ + ε, (44)

where W is a vector of observable covariates consisting of a constant, age, age squared,

and dummy variables for year of birth, race, marital status, residence in a Standard

Metropolitan Statistical Area (SMSA), and census region of residence. The function g

is modeled as a piecewise quadratic spline with a knot placed at the median level of

education. Following Angrist and Krueger (1991), we address the possible endogeneity

of D by employing a vector of instruments Z consisting of W and a set of quarter of

birth dummy variables interacted with year of birth dummy variables.

We conduct our analysis employing the decennial census extracts studied by Angrist

and Krueger (1991). All statistical procedures are based on In(R) − In(Θ) and, as in

Angrist and Krueger (1991), applied separately to men born in the 1920-1929, 1930-1939,

and 1940-1949 cohorts. Since the model is linear and identification holds when the rank

condition is satisfied, we set `n = +∞, τn = 0, and implement our tests by following

the discussion in Section 2.1. All bootstrap critical values are based on five thousand

bootstrap draws, with both bootstrap statistics based on the constrained estimator.

p-value of test of shape restrictionShape Restr. rn 1920-1929 Cohort 1930-1939 Cohort 1940-1949 Cohort

Mon. 95% 0.636 0.815 0.982Mon. 0 0.218 0.528 0.836

Mon.+Conc. 95% 0.184 0.750 0.551Mon.+Conc. 0 0.054 0.476 0.332

Table 4: Test of whether returns to education satisfy a shape restriction. Choice ofbandwidth rn = 95% corresponds to setting 1− γn = 0.95 in (4).

We first examine the empirical plausibility of returns to education satisfying certain

shape restrictions. In Table 4 we report the p-values for the null hypothesis that g

in (44) is monotonically increasing (i.e. marginal returns are positive) and for the null

hypothesis that it is monotonically increasing and concave (i.e. marginal returns are pos-

itive and decreasing). The p-values are obtained under different choices of rn, including

rn computed as suggested in Section 2.1 by setting 1 − γn = 0.95 in (4). While not

theoretically justified, we also report p-values corresponding to rn = 0 to illustrate the

sensitivity of the test to the choice of rn. We find little evidence against the conjectured

shape restrictions across cohorts and theoretically recommended choices of rn.

33

Shape Restr. rn Restr. Estimate Restr. CI 2SLS 2SLS CIMon. 95% 0.000 [0.000, 0.165] -0.083 [-0.291,0.125]

1920-1929 Mon. 0 0.000 [0.000, 0.123] -0.083 [-0.291,0.125]Cohort Mon.+Conc. 95% 0.086 [0.001, 0.173] -0.083 [-0.291,0.125]

Mon.+Conc. 0 0.086 [0.016, 0.150] -0.083 [-0.291,0.125]Mon. 95% 0.116 [0.000, 0.243] 0.160 [-0.061,0.380]

1930-1939 Mon. 0 0.116 [0.000, 0.243] 0.160 [-0.061,0.380]Cohort Mon.+Conc. 95% 0.090 [0.000, 0.202] 0.160 [-0.061,0.380]

Mon.+Conc. 0 0.090 [0.000, 0.196] 0.160 [-0.061,0.380]Mon. 95% 0.032 [0.000, 0.194] 0.032 [-0.155,0.220]

1940-1949 Mon. 0 0.032 [0.000, 0.194] 0.032 [-0.155,0.220]Cohort Mon.+Conc. 95% 0.078 [0.000, 0.167] 0.032 [-0.155,0.220]

Mon.+Conc. 0 0.078 [0.000, 0.162] 0.032 [-0.155,0.220]

Table 5: 95% Confidence region for marginal return to education at the median.Columns: (i) Imposed shape restriction; (ii) Choice of rn, with rn = 95% denoting1− γn = 0.95 in (4); (iii) Shape Restricted 2SLS estimate; (iv) Confidence region usingshape restrictions; (v)-(vi) Unrestricted 2SLS estimate and Wald confidence region.

In Table 5 we impose the conjectured shape restrictions while building 95%-level

confidence intervals for the marginal return to education at the median level of education.

By means of comparison, we also report the Wald confidence regions based on the

unrestricted two stage least squares (2SLS) estimator. Overall, we find little sensitivity

of our results to the choice of rn. In unreported results we found the confidence regions

corresponding to rn = +∞ to equal those obtained by setting 1 − γn = 0.95. We

further find imposing shape restrictions considerably sharpens the confidence regions

with the impact on the analysis of the 1930-1939 cohort being particularly salient. With

the exception of the 1920-1929 cohort, we also find imposing both monotonicity and

concavity is more informative than only imposing monotonicity.

34

References

Ai, C. and Chen, X. (2003). Efficient estimation of models with conditional moment restrictionscontaining unknown functions. Econometrica, 71 1795–1844.

Ai, C. and Chen, X. (2007). Estimation of possibly misspecified semiparametric conditionalmoment restriction models with different conditioning variables. Journal of Econometrics,141 5–43.

Ai, C. and Chen, X. (2012). The semiparametric efficiency bound for models of sequentialmoment restrictions containing unknown functions. Journal of Econometrics, 170 442–457.

Aıt-Sahalia, Y. and Duarte, J. (2003). Nonparametric option pricing under shape restric-tions. Journal of Econometrics, 116 9–47.

Andrews, D. W. (2000). Inconsistency of the bootstrap when a parameter is on the boundaryof the parameter space. Econometrica, 68 399–405.

Andrews, D. W. (2001). Testing when a parameter is on the boundary of the maintainedhypothesis. Econometrica 683–734.

Andrews, D. W. K. and Shi, X. (2013). Inference based on conditional moment inequalities.Econometrica, 81 609–666.

Andrews, D. W. K. and Soares, G. (2010). Inference for parameters defined by momentinequalities using generalized moment selection. Econometrica, 78 119–157.

Angrist, J. D. and Krueger, A. B. (1991). Does compulsory school attendance affect school-ing and earnings? The Quarterly Journal of Economics, 106 979–1014.

Armstrong, T. (2015). Adaptive testing on a regression function at a point. The Annals ofStatistics, 43 2086–2101.

Athey, S. and Stern, S. (1998). An empirical framework for testing theories about compli-mentarity in organizational design. Tech. rep., National Bureau of Economic Research.

Babii, A. and Kumar, R. (2019). Isotonic regression discontinuity designs. Available at SSRN3458127.

Beare, B. K. and Schmidt, L. D. (2014). An empirical test of pricing kernel monotonicity.Journal of Applied Econometrics (in press).

Beresteanu, A. and Molinari, F. (2008). Asymptotic properties for a class of partiallyidentified models. Econometrica, 76 763–814.

Blundell, R., Chen, X. and Kristensen, D. (2007). Semi-nonparametric iv estimation ofshape-invariant engel curves. Econometrica, 75 1613–1669.

Blundell, R., Horowitz, J. L. and Parey, M. (2012). Measuring the price responsivenessof gasoline demand: Economic shapre restrictions and nonparametric demand estimation.Quantitative Economics, 3 29–51.

Bugni, F. A., Canay, I. A. and Shi, X. (2017). Inference for subvectors and other functionsof partially identified parameters in moment inequality models. Quantitative Economics, 81–38.

Canay, I. A. and Shaikh, A. M. (2017). Practical and theoretical advances in inference forpartially identified models. In Advances in Economics and Econometrics: Eleventh WorldCongress, vol. 2. Cambridge University Press Cambridge, 271–306.

35

Chamberlain, G. (1992). Comment: Sequential moment restrictions in panel data. Journal ofBusiness & Economic Statistics, 10 20–26.

Chatterjee, S., Guntuboyina, A. and Sen, B. (2013). Improved risk bounds in isotonicregression. arXiv preprint arXiv:1311.3765.

Chen, X. (2007). Large sample sieve estimation of semi-nonparametric models. In Handbook ofEconometrics 6B (J. J. Heckman and E. E. Leamer, eds.). North Holland, Elsevier.

Chen, X. and Pouzo, D. (2009). Efficient estimation of semiparametric conditional momentmodels with possibly nonsmooth residuals. Journal of Econometrics, 152 46–60.

Chen, X. and Pouzo, D. (2012). Estimation of nonparametric conditional moment modelswith possibly nonsmooth generalized residuals. Econometrica, 80 277–321.

Chen, X. and Pouzo, D. (2015). Sieve wald and qlr inferences on semi/nonparametric condi-tional moment models. Econometrica, 83 1013–1079.

Chen, X., Pouzo, D. and Tamer, E. (2011a). Qlr inference on partially identified nonpara-metric conditional moment models. Working paper, Yale University.

Chen, X. and Santos, A. (2018). Overidentification in regular models. Econometrica, 861771–1817.

Chen, X., Tamer, E. and Torgovitsky, A. (2011b). Sensitivity analysis in semiparametriclikelihood models. Working paper, Yale University.

Chernozhukov, V., Chetverikov, D. and Kato, K. (2014). Comparison and anti-concentration bounds for maxima of gaussian random vectors. Probability Theory and RelatedFields, 162 47–70.

Chernozhukov, V., Fernandez-Val, I. and Galichon, A. (2009). Improving point andinterval estimators of monotone functions by rearrangement. Biometrika, 96 559–575.

Chernozhukov, V. and Hansen, C. (2005). An iv model of quantile treatment effects. Econo-metrica, 73 245–261.

Chernozhukov, V., Hong, H. and Tamer, E. (2007). Estimation and confidence regions forparameter sets in econometric models. Econometrica, 75 1243–1284.

Chernozhukov, V., Lee, S. and Rosen, A. M. (2013). Intersection bounds: Estimation andinference. Econometrica, 81 667–737.

Chetverikov, D. (2019). Testing regression monotonicity in econometric models. EconometricTheory, 35 729–776.

Chetverikov, D., Santos, A. and Shaikh, A. M. (2018). The econometrics of shape restric-tions. Annual Review of Economics, 10 31–63.

Chetverikov, D. and Wilhelm, D. (2017). Nonparametric instrumental variable estimationunder monotonicity. Econometrica, 85 1303–1320.

Dette, H., Hoderlein, S. and Neumeyer, N. (2016). Testing multivariate economic restric-tions using quantiles: the example of slutsky negative semidefiniteness. Journal of Economet-rics, 191 129–144.

Eichenbaum, M. S., Hansen, L. P. and Singleton, K. J. (1988). A time series analysisof representative agent models of consumption and leisure choice under uncertainty. TheQuarterly Journal of Economics, 103 51–78.

36

Fang, Z. and Seo, J. (2019). A general framework for inference on shape restrictions. arXivpreprint arXiv:1910.07689.

Freyberger, J. and Horowitz, J. L. (2015). Identification and shape restrictions in non-parametric instrumental variables estimation. Journal of Econometrics, 189 41–53.

Freyberger, J. and Reeves, B. (2018). Inference under shape restrictions. Available at SSRN3011474.

Galichon, A. and Henry, M. (2009). A test of non-identifying restrictions and confidenceregions for partially identified parameters. Journal of Econometrics, 152 186–196.

Gentzkow, M. (2007). Valuing new goods in a model with complementarity: Online newspa-pers. American Economic Review, 97 713–744.

Haag, B. R., Hoderlein, S. and Pendakur, K. (2009). Testing and imposing slutskysymmetry in nonparametric demand systems. Journal of Econometrics, 153 33–50.

Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators.Econometrica, 50 891–916.

Hausman, J. A. and Newey, W. K. (1995). Nonparametric estimation of exact consumerssurplus and deadweight loss. Econometrica 1445–1476.

Hausman, J. A. and Newey, W. K. (2016). Individual heterogeneity and average welfare.Econometrica, 84 1225–1248.

Ho, K. and Rosen, A. M. (2017). Partial identification in applied research: Benefits and chal-lenges. In Advances in Economics and Econometrics: Volume 2: Eleventh World Congress,vol. 2. Cambridge University Press, 307.

Hong, S. (2017). Inference in semiparametric conditional moment models with partial identifi-cation. Journal of econometrics, 196 156–179.

Horowitz, J. L. and Lee, S. (2017). Nonparametric estimation and inference under shaperestrictions. Journal of econometrics, 201 108–126.

Jackwerth, J. C. (2000). Recovering risk aversion from option prices and realized returns.Review of Financial Studies, 13 433–451.

Kaido, H. and Santos, A. (2014). Asymptotically Efficient Estimation of Models Defined byConvex Moment Inequalities. Econometrica, 82 387–413.

Kline, P. and Santos, A. (2012). A score based approach to wild bootstrap inference. Journalof Econometric Methods, 1 23–41.

Koltchinskii, V. I. (1994). Komlos-major-tusnady approximation for the general empiricalprocess and haar expansions of classes of functions. Journal of Theoretical Probability, 773–118.

Kretschmer, T., Miravete, E. J. and Pernıas, J. C. (2012). Competitive pressure andthe adoption of complementary innovations. The American Economic Review, 102 1540.

Ledoux, M. and Talagrand, M. (1988). Un critere sur les petites boules dans le theoremelimite central. Probability Theory and Related Fields, 77 29–47.

Lee, S., Song, K. and Whang, Y.-J. (2018). Testing for a general class of functional inequal-ities. Econometric Theory, 34 1018–1064.

Lewbel, A. (1995). Consistent nonparametric hypothesis tests with an application to slutskysymmetry. Journal of Econometrics, 67 379–401.

37

Linton, O., Song, K. and Whang, Y.-J. (2010). An improved bootstrap test of stochasticdominance. Journal of Econometrics, 154 186–202.

Manski, C. F. (2003). Partial Identification of Probability Distributions. Springer-Verlag, NewYork.

Matzkin, R. L. (1994). Restrictions of economic theory in nonparametric methods. In Handboodof Econometrics (R. Engle and D. McFadden, eds.), vol. IV. Elsevier.

Newey, W. K. and Powell, J. (2003). Instrumental variables estimation of nonparametricmodels. Econometrica, 71 1565–1578.

Reguant, M. (2014). Complementary bidding mechanisms and startup costs in electricitymarkets. The Review of Economic Studies, 81 1708–1742.

Romano, J. P. and Shaikh, A. M. (2012). On the uniform asymptotic validity of subsamplingand the bootstrap. The Annals of Statistics, 40 2798–2822.

Santos, A. (2011). Instrumental variables methods for recovering continuous linear functionals.Journal of Econometrics, 161 129–146.

Santos, A. (2012). Inference in nonparametric instrumental variables with partial identification.Econometrica, 80 213–275.

Sargan, J. D. (1958). The estimation of economic relationships using instrumental variables.Econometrica, 26 393–415.

Tao, J. (2014). Inference for point and partially identified semi-nonparametric conditionalmoment models. Working paper, University of Washington.

Topkis, D. M. (1998). Supermodularity and complementarity. Princeton university press.

Torgovitsky, A. (2019). Nonparametric inference on state dependence in unemployment.Econometrica, 87 1475–1505.

Wolak, F. A. (2007). Quantifying the supply-side benefits from forward contracting in whole-sale electricity markets. Journal of Applied Econometrics, 22 1179–1209.

Yurinskii, V. V. (1977). On the error of the gaussian approximation for convolutions. Theoryof Probability and Its Applications, 2 236–247.

Zhai, A. (2018). A high-dimensional clt in w2 distance with near optimal convergence rate.Probability Theory and Related Fields, 170 821–845.

Zhu, Y. (2019). Inference in non-parametric/semi-parametric moment equality models withshape restrictions. Semi-Parametric Moment Equality Models with Shape Restrictions (Octo-ber 23, 2019).

38

Constrained Conditional Moment Restriction Models

Supplemental Appendix

Victor Chernozhukov

Department of Economics

M.I.T.

[email protected]

Whitney K. Newey‖


M.I.T.

[email protected]

Andres Santos∗∗


U.C.L.A.

[email protected]

January, 2020

This Supplemental Appendix to “Constrained Conditional Moment Restriction Mod-

els” is organized as follows. Sections S.1 and S.2 provide a review of AM spaces and

additional simulation results. Sections S.3 and S.4 contain the proofs of Theorems 3.1

and 3.2 respectively. The proofs for all remaining results concerning our bootstrap ap-

proximation and test are contained in Section S.5. Section S.6 analyses a number of

examples including a formal study of GMM (which covers Section 2.1) and the exam-

ple discussed in Section 2.2. Section S.6 additionally includes an example concerning

inference under Slutsky restrictions and the proof of the results stated in Section 4.

Finally, Sections S.7, S.8, and S.9 develop results that may be of independent interest,

and include the analysis of the local parameter space, empirical process coupling results

based on Koltchinskii (1994), and bootstrap coupling results.

‖Research supported by NSF Grant 1757140.∗∗Research supported by NSF Grant SES-1426882.

S.1 AM Spaces

We provide a brief introduction to AM spaces and refer the reader to Chapters 8 and 9

of Aliprantis and Border (2006) for a more detailed exposition. Before proceeding, we

first recall the definitions of a partially ordered set and a lattice.

Definition S.1.1. A partially ordered set (G,≥) is a set G with a partial order rela-

tionship ≥ defined on it – i.e. ≥ is a transitive (x ≥ y and y ≥ z implies x ≥ z), reflexive

(x ≥ x), and antisymmetric (x ≥ y implies the negation of y ≥ x) relation.

Definition S.1.2. A lattice is a partially ordered set (G,≥) such that any pair x, y ∈ G

has a least upper bound (denoted x ∨ y) and a greatest lower bound (denoted x ∧ y).

Whenever G is both a vector space and a lattice, it is possible to define objects that

depend on both the vector space and lattice operations. In particular, for x ∈ G we

define the positive part x+ ≡ x ∨ 0, the negative part x− ≡ (−x) ∨ 0, and the absolute

value |x| ≡ x ∨ (−x). It is also natural to demand that the order relation ≥ interact

with the algebraic operations in a manner analogous to that of R – i.e. to have

x ≥ y implies x+ z ≥ y + z for each z ∈ G (S.1)

x ≥ y implies αx ≥ αy for each 0 ≤ α ∈ R . (S.2)

A complete normed vector space that shares these familiar properties of R under a given

order relation ≥ is referred to as a Banach lattice. Formally, we define:

Definition S.1.3. A Banach space G with norm ‖ · ‖G is a Banach lattice if (i) G is a

lattice under ≥, (ii) ‖x‖G ≤ ‖y‖G when |x| ≤ |y|, (iii) (S.1) and (S.2) hold.

An AM space is a Banach lattice in which the maximum of the norms of any two

positive elements is equal to the norm of the maximums of the two elements.

Definition S.1.4. A Banach lattice G is called an AM space if for any elements 0 ≤x, y ∈ G it follows that ‖x ∨ y‖G = max‖x‖G, ‖y‖G.

In certain Banach lattices there may exist an element 1G > 0 called an order unit

such that for any x ∈ G there exists a 0 < λ ∈ R for which |x| ≤ λ1G – for example, in

Rd the vector (1, . . . , 1)′ is an order unit. The order unit 1G can be used to define

‖x‖∞ ≡ infλ > 0 : |x| ≤ λ1G, (S.3)

which is a norm on G. In principle, ‖ · ‖∞ need not be related to the original norm

‖ · ‖G. However, if G is an AM space, then ‖ · ‖G and ‖ · ‖∞ are equivalent in that they

generate the same topology. Hence, we refer to G as an AM space with unit 1G if: (i)

G is an AM space, (ii) 1G is an order unit in G, and (iii) The norm of G equals ‖ · ‖∞.

1

S.2 Additional Simulations

Below we report simulation results for additional choices of (jn, kn) in the identified

design discussed in Section 5.1 of the main paper.

Imposed Shape Restriction: MonotonicityInference on Level Inference on Derivative

(jn, kn) (jn, kn)rn (3,3) (3,5) (4,4) (4,6) (5,5) (5,8) (3,3) (3,5) (4,4) (4,6) (5,5) (5,8)∞ 1.84 1.88 1.90 1.72 0.86 1.32 3.60 3.92 1.88 2.02 1.28 3.76

95% 1.80 1.84 1.74 1.68 1.12 1.44 3.40 3.72 1.90 2.08 1.30 2.04n = 500 50% 1.92 2.24 1.74 1.70 1.12 1.44 3.52 4.04 1.90 2.10 1.30 2.04

5% 3.96 4.40 2.18 2.90 1.14 1.72 4.58 5.14 2.20 2.96 1.34 2.340 5.32 4.84 5.30 5.10 3.48 4.30 5.00 5.52 4.62 4.48 3.40 4.38∞ 1.30 1.42 1.56 1.82 0.84 1.08 2.90 3.06 1.68 1.94 1.18 1.24

95% 1.20 1.32 1.52 1.84 0.90 1.10 2.88 2.96 1.64 1.86 1.24 1.18n = 1000 50% 1.44 1.96 1.52 1.86 0.90 1.10 3.24 3.60 1.64 1.86 1.20 1.18

5% 3.00 3.72 2.02 2.84 0.92 1.32 4.22 4.70 2.06 3.06 1.22 1.460 3.94 4.20 4.54 4.56 3.62 3.66 4.18 4.74 4.58 4.68 3.48 3.32∞ 0.98 1.20 1.34 1.58 0.70 0.68 2.88 2.84 1.26 1.52 0.98 0.94

95% 1.00 1.36 1.40 1.50 0.76 0.70 3.06 2.96 1.32 1.62 0.98 0.96n = 5000 50% 2.06 2.48 1.42 1.52 0.76 0.70 4.40 4.34 1.32 1.62 0.96 0.96

5% 3.98 4.54 2.20 3.62 0.92 1.06 5.00 5.20 2.36 3.36 0.98 1.340 4.74 5.20 3.98 4.56 3.18 3.52 4.94 5.34 4.68 4.50 3.72 3.76

Imposed Shape Restriction: Monotonicity & ConvexityInference on Level Inference on Derivative

(jn, kn) (jn, kn)rn (3,3) (3,5) (4,4) (4,6) (5,5) (5,8) (3,3) (3,5) (4,4) (4,6) (5,5) (5,8)∞ 1.56 1.50 1.44 1.52 1.06 1.70 4.32 4.40 2.74 2.84 2.54 2.98

95% 1.36 1.42 1.46 1.54 1.28 1.78 4.24 4.22 2.68 2.84 2.66 3.10n = 500 50% 1.36 1.44 1.46 1.54 1.28 1.78 4.24 4.24 2.68 2.84 2.66 3.10

5% 3.40 3.88 1.52 1.82 1.22 1.86 4.98 5.26 2.74 2.98 2.68 3.120 5.30 4.84 5.42 5.36 4.48 5.22 5.48 5.48 5.08 4.84 4.18 4.80∞ 0.78 1.10 1.40 1.54 1.18 1.32 3.28 3.40 2.26 2.32 2.12 2.02

95% 0.98 1.06 1.36 1.44 1.22 1.30 3.30 3.42 2.04 2.26 2.32 2.08n = 1000 50% 0.98 1.12 1.36 1.44 1.22 1.30 3.30 3.46 2.04 2.26 2.32 2.08

5% 2.70 3.62 1.44 1.86 1.22 1.42 4.18 4.68 2.14 2.38 2.32 2.100 3.94 4.20 4.62 4.78 4.42 4.56 4.30 4.64 4.38 4.20 4.06 3.90∞ 0.78 0.84 1.04 1.36 0.98 1.00 3.28 3.18 1.36 1.58 1.48 1.56

95% 0.80 0.78 1.06 1.42 0.96 1.04 3.32 3.04 1.36 1.62 1.42 1.44n = 5000 50% 3.86 1.78 1.06 1.42 0.96 1.04 3.82 3.70 1.36 1.62 1.42 1.44

5% 3.86 4.40 1.42 2.38 0.96 1.10 4.98 4.68 1.46 1.86 1.42 1.460 4.74 5.20 4.10 4.74 3.78 4.22 4.78 5.02 3.98 4.06 3.38 3.34

Table 6: Empirical rejection probabilities for 5%-level tests based on In(R)−In(Θ). Vec-tors pjnn (x) and qknn (z) constructed from piecewise quadratic continuously differentiablesplines. Value of rn set to a percentile corresponds to choice of 1− γn in (42).

2

S.3 Rate of Convergence

Proof of Theorem 3.1: For conciseness, define ηn ≡ k1/pn

√log(1 + kn)JnBn/

√n and

let δ−1n ≡ νn(ηn + τn). In addition, we define the event An ≡ An1 ∩An2 ∩An3 where

An1 ≡ Θrn ⊆ Vn(P )

An2 ≡ Σ−1n exists and max‖Σ−1

n ‖o,p, ‖Σn‖o,p, ‖Σn(P )−1‖o,p, ‖Σn(P )‖o,p < B

An3 ≡ supθ∈Θr

0n(P )Qn,P (θ)‖Σn − Σn(P )‖o,p ≤ Bηn and ‖Σn − Σn(P )‖o,p ≤

1

2B. (S.4)

Moreover, note that for any ε > 0 and B sufficiently large we can conclude that

lim supn→∞

supP∈P0

P (Acn) < ε (S.5)

due to Lemma S.3.4 and Assumptions 3.4(i), 3.4(iii), and 3.6. Therefore, we obtain that

lim supn→∞

supP∈P0

P (δn−→d H(Θr

n,Θr0n(P ), ‖ · ‖E) > 2M )

≤ lim supn→∞

supP∈P0

P (δn−→d H(Θr

n,Θr0n(P ), ‖ · ‖E) > 2M ; An) + ε (S.6)

for any M . For each P ∈ P0, next partition Vn(P ) into subsets Sn,j(P ) defined by

Sn,j(P ) ≡ θ ∈ Vn(P ) : 2j−1 < δn−→d H(θ,Θr

0n(P ), ‖ · ‖E) ≤ 2j.

Since Θrn ⊆ Vn(P ) under An, it follows from the definition of Θr

n, and (S.6) that

lim supn→∞

supP∈P0

P (δn−→d H(Θr

n,Θr0n(P ), ‖ · ‖E) > 2M )

≤ lim supn→∞

supP∈P0

∞∑j≥M

P ( infθ∈Sn,j(P )

Qn(θ) ≤ infθ∈Θn∩R

Qn(θ) + τn; An) + ε. (S.7)

Letting Qn,P (θ) ≡ ‖EP [ρ(X, θ) ∗ qknn (Z)]‖Σn,p, we then obtain from Lemma S.3.2 that

infθ∈Θn∩R


Qn,P (θ) + ‖Σn‖o,pZn,P ≤ infθ∈Θn∩R

Qn,P (θ) +BZn,P (S.8)

where the final inequality holds under the event An by (S.4). Moreover, since for any

a ∈ Rkn we have ‖Σna‖p ≤ ‖ΣnΣn(P )−1‖o,p‖Σn(P )a‖p, we obtain from the inequality

3

‖ΣnΣn(P )−1‖o,p ≤ ‖Σn − Σn(P )Σn(P )−1‖o,p + 1 that under the event An

infθ∈Θn∩R

Qn,P (θ) ≤ ‖ΣnΣn(P )−1‖o,p infθ∈Θn∩R

Qn,P (θ)

≤ 1 + ‖Σn(P )−1‖o,p‖Σn − Σn(P )‖o,p infθ∈Θn∩R

Qn,P (θ)

≤ infθ∈Θn∩R

Qn,P (θ) +B2ηn. (S.9)

In addition, note that by similar arguments we also obtain from Lemma S.3.2 and

‖Σn(P )a‖p ≤ ‖Σn(P )Σ−1n ‖o,p‖Σna‖p that under the event An we have

infθ∈Sn,j(P )

Qn(θ) ≥ infθ∈Sn,j(P )

Qn,P (θ)− ‖Σn‖o,pZn,P

≥ ‖Σn(P )Σ−1n ‖−1

o,p infθ∈Sn,j(P )

Qn,P (θ)−BZn,P . (S.10)

Next, we note the triangle inequality, ‖(Σn(P )−Σn)Σ−1n ‖o,p ≤ ‖Σ−1

n ‖o,p‖Σn−Σn(P )‖o,p,and ‖Σ−1

n ‖o,p ≤ B under the event An by (S.4) yield the inequality

‖Σn(P )Σ−1n ‖−1

o,p − 1 ≥ (‖(Σn(P )− Σn)Σ−1n ‖o,p + 1)−1 − 1

≥ −‖(Σn(P )− Σn)Σ−1n ‖o,p ≥ −B‖Σn − Σn(P )‖o,p. (S.11)

Therefore, combining results (S.10) and (S.11), together with Assumption 3.6(i) and the

definition of Sn,j(P ) we obtain for B sufficiently large that under the event An we have

infθ∈Sn,j(P )

Qn(θ) ≥ (1−B‖Σn − Σn(P )‖o,p)× infθ∈Sn,j(P )

Qn,P (θ)−BZn,P

≥ (1−B‖Σn − Σn(P )‖o,p)( supθ∈Θr

0n(P )Qn,P (θ) +

2j−1

νnδn−Bηn)−BZn,P

≥ infθ∈Θn∩R

Qn,P (θ) +2j−2

νnδn−B(Zn,P + 2Bηn), (S.12)

where in the final inequality we employed Assumption 3.5 and the definition of An in

(S.4). Hence, combining results (S.7), (S.8), (S.9), and (S.12) allows us to conclude

lim supM↑∞

lim supn→∞

supP∈P0

P (δn−→d H(Θr

n,Θr0n(P ), ‖ · ‖E) > 2M )

≤ lim supM↑∞

lim supn→∞

supP∈P0

∞∑j≥M

P (2j−2

νnδn≤ 3B(Bηn + Zn,P ) + τn; An) + ε

≤ lim supM↑∞

lim supn→∞

supP∈P0

∞∑j≥M

P (2(j−3)(ηn + τn) ≤ 3BZn,P ) + ε, (S.13)

4

where in the final inequality we employed that we had defined δ−1n ≡ νn(ηn + τn).

Therefore, Zn,P ∈ R+, Lemma S.3.2, τn ≥ 0, and Markov’s inequality yield

lim supM↑∞

lim supn→∞

supP∈P0

∞∑j≥M

P (2(j−3)(ηn + τn) ≤ 3BZn,P )

. lim supM↑∞

lim supn→∞

∑j≥M

2−j × 1

ηn

k1/pn

√log(1 + kn)JnBn√

n= 0, (S.14)

where in the final result we employed that ηn ≡ k1/pn

√log(1 + kn)JnBn/

√n. The first

claim of the theorem therefore follows from (S.13) and (S.14) and ε being arbitrary.

To establish the second claim of the theorem, next define the event An4 ≡ Θr0n(P ) ⊆

Θrn. Since

−→d H(Θr

0n(P ), Θrn, ‖·‖E) = 0 whenever the event An4 occurs, we can conclude

from Lemma S.3.1(ii) and part (i) of this theorem that

lim supM↑∞

lim supn→∞

supP∈P0

P (δndH(Θrn,Θ

r0n(P ), ‖ · ‖E) > 2M )

= lim supM↑∞

lim supn→∞

supP∈P0

P (δn−→d H(Θr

n,Θr0n(P ), ‖ · ‖E) > 2M ) = 0, (S.15)

and thus the theorem follows from δ−1n = νn(ηn + τn) and ηn = o(τn).

Lemma S.3.1. Let Assumptions 3.2, 3.3(i), 3.3(iii), 3.4, 3.6(ii) hold, ‖ · ‖A be a norm

on Bn and for some ε > 0 let Vn(P ) ≡ θ ∈ Θn ∩R :−→d H(θ,Θr

0n(P ), ‖ · ‖A) ≤ ε and

Sn(ε) ≡ infP∈P0

infθ∈(Θn∩R)\Vn(P )

Qn,P (θ)− infθ∈Θn∩R

Qn,P (θ).

(i) If ηn ∨ τn = o(Sn(ε)) for ηn ≡ k1/pn

√log(1 + kn)JnBn/

√n, then Θr

n ⊆ Vn(P ) with

probability tending to one uniformly in P ∈ P0. (ii) If Assumption 3.5 holds and

ηn = o(τn), then Θr0n(P ) ⊆ Θr

n with probability tending to one uniformly in P ∈ P0.

Proof: For a given ε > 0 first notice that by definition of Θrn and Vn(P ) we have

P (−→d H(Θr

n,Θr0n(P ), ‖·‖A) > ε) ≤ P ( inf

θ∈(Θn∩R)\Vn(P )Qn(θ) ≤ inf

θ∈Θn∩RQn(θ)+τn) (S.16)

for all P ∈ P0. For conciseness let ηn ≡ k1/pn

√log(1 + kn)JnBn/

√n, and setting

Qn,P (θ) ≡ ‖EP [ρ(X, θ) ∗ qknn (Z)]‖Σn,p then note that Lemmas S.3.2 and S.3.4 imply


Qn,P (θ) ≤ infθ∈(Θn∩R)\Vn(P )

Qn(θ) +OP (ηn) (S.17)

uniformly in P ∈ P0. In addition, by similar arguments we obtain uniformly in P ∈ P0

infθ∈Θn∩R


Qn,P (θ) +OP (ηn). (S.18)

5

Next note that for any a ∈ Rkn we have ‖Σn(P )a‖p ≤ ‖Σn(P )Σ−1n ‖o,p‖Σna‖p, and

therefore from the definition of Sn(ε) we obtain that


Qn,P (θ) ≥ ‖Σn(P )Σ−1n ‖−1

o,p infθ∈(Θn∩R)\Vn(P )

Qn,P (θ)

≥ ‖Σn(P )Σ−1n ‖−1

o,pSn(ε) + infθ∈Θn∩R

Qn,P (θ). (S.19)

Similarly, employing that ‖Σna‖p ≤ ‖ΣnΣn(P )−1‖o,p‖Σn(P )a‖p, we can conclude

infθ∈Θn∩R

Qn,P (θ)− ‖Σn(P )Σ−1n ‖−1

o,p infθ∈Θn∩R

Qn,P (θ)

≤ ‖ΣnΣn(P )−1‖o,p − ‖Σn(P )Σ−1n ‖−1

o,p infθ∈Θn∩R

Qn,P (θ). (S.20)

For In the kn × kn identity matrix, then note that ‖In‖o,p = 1 implies the bound

|‖Σn(P )Σ−1n ‖o,p − 1| = |‖Σn(P )Σ−1

n ‖o,p − ‖In‖o,p| ≤ ‖(Σn(P )− Σn)Σ−1n ‖o,p

≤ ‖Σ−1n ‖o,p‖Σn(P )− Σn‖o,p = OP (‖Σn(P )− Σn‖o,p) (S.21)

where the final equality holds uniformly in P ∈ P0 by Lemma S.3.4. By identical

arguments it follows that |‖ΣnΣn(P )−1‖o,p − 1| = OP (‖Σn − Σn(P )‖o,p) uniformly in

P ∈ P0, and therefore result (S.20) and Assumption 3.6(ii) allows us to conclude that

infθ∈Θn∩R

Qn,P (θ)− ‖Σn(P )Σ−1n ‖−1

o,p infθ∈Θn∩R

Qn,P (θ) ≤ OP (ηn) (S.22)

uniformly in P ∈ P0. Therefore, (S.16), (S.17), (S.18), (S.19), and (S.22) imply

lim supn→∞

supP∈P0

P (−→d H(Θr

n,Θr0n(P ), ‖ · ‖A) > ε)

≤ lim supM↑∞

lim supn→∞

supP∈P0

P (Sn(ε) ≤ ‖Σn(P )‖o,p‖Σn‖o,pM(ηn + τn)) = 0

where the equality follows from Lemma S.3.4, Assumption 3.4(iii), and ηn∨τn = o(Sn(ε))

by hypothesis. Part (i) of the lemma then follows by definition of Vn(P ).

In order to establish part (ii) of the lemma, note the definition of Θrn implies

P (Θr0n(P ) ⊆ Θr

n) ≥ P ( supθ∈Θr

0n(P )Qn(θ) ≤ inf

θ∈Θn∩RQn(θ) + τn) (S.23)

for all P ∈ P0. Moreover, applying Lemma S.3.2 and ‖Σna‖p ≤ ‖ΣnΣn(P )−1‖o,p‖Σn(P )a‖p

6

for any a ∈ Rkn we obtain that uniformly in P ∈ P0 we have

supθ∈Θr

0n(P )Qn(θ) ≤ sup

θ∈Θr0n(P )

Qn,P (θ) +OP (ηn)

≤ ‖ΣnΣn(P )−1‖o,p supθ∈Θr

0n(P )Qn,P (θ) +OP (ηn) = inf

θ∈Θn∩RQn,P (θ) +OP (ηn), (S.24)

where the final equality follows from Assumption 3.5, identical arguments to those in

(S.21) implying |‖ΣnΣn(P )−1‖o,p− 1| = OP (‖Σn−Σn(P )‖o,p) uniformly in P ∈ P, and

Assumption 3.6(ii). Similarly, Lemma S.3.2, ‖Σn(P )a‖p ≤ ‖Σn(P )Σ−1n ‖o,p‖Σna‖p for

any a ∈ Rkn , Assumption 3.6(ii), and result (S.21) imply

infθ∈Θn∩R

Qn(θ) ≥ infθ∈Θn∩R

Qn,P (θ)−OP (ηn)

≥ ‖Σn(P )Σ−1n ‖−1

o,p infθ∈Θn∩R

Qn,P (θ)−OP (ηn) = infθ∈Θn∩R

Qn,P (θ)−OP (ηn) (S.25)

uniformly in P ∈ P0. Part (ii) of the lemma thus follows from results (S.23), (S.24),

(S.25), and ηn = o(τn) by hypothesis.

Lemma S.3.2. Let Qn,P (θ) ≡ ‖EP [ρ(X, θ)∗qknn (Z)]‖Σn,p, and Assumptions 3.2, 3.3(i),

and 3.3(iii) hold. Then, for each P ∈ P there are random Zn,P ∈ R+ with

|Qn(θ)− Qn,P (θ)| ≤ ‖Σn‖o,p ×Zn,P ,

for all θ ∈ Θn ∩R and in addition supP∈PEP [Zn,P ] = O(k1/pn

√log(1 + kn)JnBn/

√n).

Proof: Let Gn ≡ fqk,n, : f ∈ Fn, 1 ≤ ≤ J and 1 ≤ k ≤ kn,. Note that by

Assumption 3.3(i), ‖qk,n,‖∞ ≤ Bn for all 1 ≤ ≤ J and 1 ≤ k ≤ kn,. Hence, letting

Fn be the envelope for Fn, as in Assumption 3.3(iii), it follows that Gn ≡ BnFn is an

envelope for Gn satisfying supP∈PEP [G2n(V )] <∞. Thus, we obtain

supP∈P

EP [ supg∈Gn

|Gn,P g|] . supP∈P

J[ ](‖Gn‖P,2,Gn, ‖ · ‖P,2) (S.26)

by Theorem 2.14.2 in van der Vaart and Wellner (1996). Moreover, also notice that

Lemma S.3.3, the change of variables u = ε/Bn, and Bn ≥ 1 imply

supP∈P

J[ ](‖Gn‖P,2,Gn, ‖ · ‖P,2) ≤ supP∈P

∫ ‖Gn‖P,20

√1 + log(knN[ ](ε/Bn,Fn, ‖ · ‖P,2))dε

≤ (1 +√

log(kn))Bn × supP∈P

J[ ](‖Fn‖P,2,Fn, ‖ · ‖P,2) = O(√

log(1 + kn)BnJn), (S.27)

where the final equality follows from Assumption 3.3(iii). Next define Zn,P ∈ R+ by

Zn,P ≡k

1/pn√n× supg∈Gn

|Gn,P g|

7

and note (S.26) and (S.27) imply supP∈PEP [Zn,P ] = O(k1/pn

√log(1 + kn)BnJn/

√n) as

desired. Since we also have that ‖Gn,Pρ(·, θ) ∗ qknn ‖p ≤ k1/pn × supg∈Gn |Gn,P g| for all

θ ∈ Θn ∩R by definition of Gn, we can in turn conclude by direct calculation

|Qn(θ)− Qn,P (θ)| ≤ ‖Σn‖o,p√n× ‖Gn,Pρ(·, θ) ∗ qknn ‖p ≤ ‖Σn‖o,p ×Zn,P ,

which establishes the claim of the lemma.

Lemma S.3.3. Let gjJj=1 be a finite set of functions satisfying supP∈P ‖gj‖P,∞ ≤ C

for all 1 ≤ j ≤ J and some C <∞. Defining the class of functions

Gn ≡ fgj : f ∈ Fn, 1 ≤ j ≤ J,

it then follows that N[ ](ε,Gn, ‖ · ‖P,2) ≤ J ×N[ ](ε/C,Fn, ‖ · ‖P,2) for all P ∈ P.

Proof: First define g+j ≡ gj ∨ 0 and g−j ≡ gj ∧ 0, where ∨ and ∧ denote the pointwise

maximums and minimums. If [fi,l,P , fi,u,P ]i is a collection of brackets for Fn satisfying∫(fi,u,P − fi,l,P )2dP ≤ ε2 (S.28)

for all i, then it follows that the following collection of brackets covers the class Gn

[g+j fi,l,P + g−j fi,u,P , g

−j fi,l,P + g+

j fi,u,P ]i,j . (S.29)

Moreover, since |gj | = g+j − g

−j by construction, we also obtain from (S.28) that

∫(g+j fi,u,P + g−j fi,l,P − g

+j fi,l,P − g

−j fi,u,P )2dP

=

∫(fi,u,P − fi,l,P )2|gj |2dP ≤ ε2C2. (S.30)

Since there are J ×N[ ](ε,Fn, ‖ · ‖P,2) brackets in (S.29), we conclude from (S.30) that

N[ ](ε,Gn, ‖ · ‖P,2) ≤ J ×N[ ](ε

C,Fn, ‖ · ‖P,2),

for all P ∈ P, which establishes the claim of the lemma.

Lemma S.3.4. If Assumption 3.4 holds, then there exists a constant B <∞ such that

lim infn→∞

infP∈P

P (Σ−1n exists and max‖Σn‖o,p, ‖Σ−1

n ‖o,p < B) = 1.

Proof: First note that by Assumption 3.4(iii) there exists a B <∞ such that

supn≥1

supP∈P

max‖Σn(P )‖o,p, ‖Σn(P )−1‖o,p <B

2. (S.31)

8

Next, let In denote the kn × kn identity matrix and for each P ∈ P rewrite Σn as

Σn = Σn(P )In − Σn(P )−1(Σn(P )− Σn). (S.32)

By Theorem 2.9 in Kress (1999), the matrix In−Σn(P )−1(Σn(P )−Σn) is invertible and

the operator norm of its inverse is bounded by two when ‖Σn(P )−1(Σn(P )− Σn)‖o,p <1/2. Since by Assumption 3.4(ii) and the equality in (S.32) it follows that Σn is invertible

if and only if In − Σn(P )−1(Σn(P )− Σn) is invertible, we obtain that

P (Σ−1n exists and ‖In − Σn(P )−1(Σn(P )− Σn)−1‖o,p < 2)

≥ P (‖Σn(P )−1(Σn − Σn(P ))‖o,p <1

2) ≥ P (‖Σn − Σn(P )‖o,p <

1

B), (S.33)

where we employed ‖Σn(P )−1(Σn − Σn(P ))‖o,p ≤ ‖Σn(P )−1‖o,p‖Σn − Σn(P )‖o,p and

(S.31). Hence, since ‖Σn(P )‖o,p < B/2 for all P ∈ P and n, (S.32) and (S.33) yield

P (Σ−1n exists and ‖Σ−1

n ‖o,p < B) ≥ P (‖Σn − Σn(P )‖o,p <1

B). (S.34)

Finally, since ‖Σn‖o,p ≤ B/2 + ‖Σn − Σn(P )‖o,p by (S.31), result (S.34) implies that

lim infn→∞

infP∈P

P (Σ−1n exists and max‖Σn‖o,p, ‖Σ−1

n ‖o,p < B)

≥ lim infn→∞

infP∈P

P (‖Σn − Σn(P )‖o,p < minB2,

1

B) = 1,

where the equality, and hence the lemma, follows from Assumption 3.4(i).

Lemma S.3.5. If a ∈ Rd, then ‖a‖p ≤ d( 1p− 1p

)+‖a‖p for any p, p ∈ [2,∞].

Proof: The case p ≤ p trivially follows from ‖a‖p ≤ ‖a‖p for all a ∈ Rd. For the case

p > p, let a = (a(1), . . . , a(d))′ and note that by Holder’s inequality we obtain

‖a‖pp =

d∑i=1

|a(i)|p× 1 ≤ d∑i=1

(|a(i)|p)pp

pp

d∑i=1

1pp−p 1−

pp =

d∑i=1

|a(i)|pppd

1− pp . (S.35)

Thus, the claim of the lemma for p > p follows from taking the 1/p power in (S.35).

S.4 Strong Approximation

Proof of Theorem 3.2: Let Pf ≡ EP [f(V )] and note by Assumption 3.4(iii) there

is a C0 <∞ such that ‖Σn(P )‖o,p ≤ C0 for all P ∈ P0. Therefore, Assumption 3.10(ii)

9

and Lemma S.3.5 imply that for all P ∈ P0, θ0 ∈ Θr0n(P ), and h/

√n ∈ Vn(θ0, `n)

‖√nPρ(·, θ0 +

h√n

) ∗ qknn − Dn,P (θ0)[h]‖Σn(P ),p

≤ C0‖√nP (ρ(·, θ0 +

h√n

)− ρ(·, θ0)) ∗ qknn − Dn,P (θ0)[h]‖2 + o(an). (S.36)

Moreover, Lemma S.4.5 and the maps mP, satisfying Assumption 3.9(i) imply that

J∑=1

kn,∑k=1

〈√nmP,(θ0 +

h√n

)−mP,(θ0) − ∇mP,(θ0)[h], qk,n,〉2L2P

≤J∑=1

C1‖√nmP,(θ0 +

h√n

)−mP,(θ0)−∇mP,(θ0)[h√n

]‖2P,2

≤J∑=1

C1K2m × n× ‖

h√n‖2L × ‖

h√n‖2E (S.37)

for some constant C1 < ∞ and all P ∈ P0, θ0 ∈ Θr0n(P ), and h/

√n ∈ Vn(θ0, `n).

Therefore, by results (S.36) and (S.37), the law of iterated expectations, the definition

of Sn(L,E), and Km`2n × Sn(L,E) = o(ann

− 12 ) by hypothesis, we obtain

supP∈P0

supθ0∈Θr

0n(P )sup

h√n∈Vn(θ0,`n)

‖√nPρ(·, θ0 +

h√n

) ∗ qknn − Dn,P (θ0)[h]‖Σn(P ),p

. Km ×√n`2n × Sn(L,E) + o(an) = o(an). (S.38)

Next, note that since k1/pn


κρn ,Fn, ‖ · ‖P,2) = o(an), As-

sumption 3.10(i) implies there is a sequence ñ satisfying the conditions of Lemma S.4.1

and `n = o(ñ). Therefore, applying Lemma S.4.1 we obtain uniformly in P ∈ P0

In(R) = infθ0∈Θr

0n(P )inf

h√n∈Vn(θ0,ñ)

‖Wn,Pρ(·, θ0)∗qknn +√nPρ(·, θ0+

h√n

)∗qknn ‖Σn(P ),p+oP (an)

(S.39)

Moreover, since `n = o(ñ) implies Vn(θ, ˜

n) ⊆ Vn(θ, `n) for all θ ∈ Θn ∩R, we have

infθ0∈Θr

0n(P )inf

h√n∈Vn(θ0,ñ)

‖Wn,Pρ(·, θ0) ∗ qknn +√nPρ(·, θ0 +

h√n

) ∗ qknn ‖Σn(P ),p

≤ infθ0∈Θr

0n(P )inf

h√n∈Vn(θ0,`n)

‖Wn,Pρ(·, θ0) ∗ qknn +√nPρ(·, θ0 +

h√n

) ∗ qknn ‖Σn(P ),p

= infθ0∈Θr

0n(P )inf

h√n∈Vn(θ0,`n)

‖Wn,Pρ(·, θ0) ∗ qknn + Dn,P (θ)[h]‖Σn(P ),p + o(an) (S.40)

uniformly in P ∈ P0, with the final equality following from (S.38), Assumption 3.4(iii)

and Lemma S.4.6. Thus, the first claim of the Theorem follows from (S.39) and (S.40),

10

while the second follows by noting that if KmR2n × Sn(L,E) = o(ann

− 12 ), then we may

set `n to simultaneously satisfy the conditions of Lemma S.4.1 and Km`2n × Sn(L,E) =

o(ann− 1

2 ), which obviates the need to introduce ñ in (S.39) and (S.40).

Lemma S.4.1. Let Assumptions 3.2, 3.3(i), 3.3(iii), 3.4, 3.6, 3.7, 3.8, and 3.10 hold,

and define Pf ≡ EP [f(V )]. Then, for any sequence `n satisfying Rn = o(`n) and

k1/pn

√log(1 + kn)Bn supP∈P J[ ](`

κρn ,Fn, ‖ · ‖P,2) = o(an), we have uniformly in P ∈ P0:

In(R) = infθ0∈Θr

0n(P )inf

h√n∈Vn(θ0,`n)

‖Wn,Pρ(·, θ0)∗qknn +√nPρ(·, θ0+

h√n

)∗qknn ‖Σn(P ),p+oP (an).

Proof: First note that the existence of the required sequence `n is guaranteed by

Assumption 3.10(i). Next, for ηn = o(an ∧Rn/νn) let θn ∈ Θn ∩R satisfy

Qn(θn) ≤ infθ∈Θn∩R

Qn(θ) + ηn. (S.41)

Note Assumption 3.10(ii) implies Assumption 3.5 and hence Assumptions 3.10(ii) and

3.6 allows us to apply Theorem 3.1 with τn ≡ ηn and employ νnηn = o(Rn) to obtain

−→d H(θn,Θr

0n(P ), ‖ · ‖E) = OP (Rn) (S.42)

uniformly in P ∈ P0. Hence, defining for each P ∈ P0 the shrinking neighborhood

(Θr0n(P ))`n ≡ θ ∈ Θn ∩R :

−→d H(θ,Θr

0n(P ), ‖ · ‖E) ≤ `n, we obtain

In(R) = infθ∈(Θr

0n(P ))`n

√nQn(θ) + oP (an) (S.43)

uniformly in P ∈ P0 due to Rn = o(`n), ηn = o(an), and results (S.41) and (S.42).

Next, note that by Assumption 3.7 and Lemma S.3.4 and S.4.6 it follows that

| infθ∈(Θr

0n(P ))`n

√nQn(θ)− inf

θ∈(Θr0n(P ))`n

‖Wn,Pρ(·, θ) ∗ qknn +√nPρ(·, θ) ∗ qknn ‖Σn,p|

≤ ‖Σn‖o,p × supθ∈Θn∩R

‖Gn,Pρ(·, θ) ∗ qknn −Wn,Pρ(·, θ) ∗ qknn ‖p = oP (an) (S.44)

uniformly in P ∈ P0. Similarly, employing Lemmas S.3.4, S.4.2, S.4.6, and `n satisfying

k1/pn


κρn ,Fn, ‖ · ‖P,2) = o(an) by hypothesis yields

infθ0∈Θr

0n(P )inf

h√n∈Vn(θ0,`n)

‖Wn,Pρ(·, θ0 +h√n

) ∗ qknn +√nPρ(·, θ0 +

h√n

) ∗ qknn ‖Σn,p

= infθ0∈Θr

0n(P )inf

h√n∈Vn(θ0,`n)

‖Wn,Pρ(·, θ0) ∗ qknn +√nPρ(·, θ0 +

h√n

) ∗ qknn ‖Σn,p + oP (an)

uniformly in P ∈ P0, which together with (S.43), (S.44), and Lemma S.4.3 establish the

claim of the Lemma.

11

Lemma S.4.2. Let Assumptions 3.3(i) and 3.8 hold. If δn is a sequence satisfying

k1/pn

√log(1 + kn)Bn × supP∈P J[ ](δ

κρn ,Fn, ‖ · ‖P,2) = o(an), then uniformly in P ∈ P:

supθ0∈Θr

0n(P )sup

h√n∈Vn(θ0,δn)


) ∗ qknn −Wn,Pρ(·, θ0) ∗ qknn ‖p = oP (an).

Proof: Since ‖qk,n,‖∞ ≤ Bn for all 1 ≤ ≤ J and 1 ≤ k ≤ kn, by Assumption 3.3(i),

Assumption 3.8 yields for any P ∈ P, θ ∈ Θn ∩R, and h/√n ∈ Vn(θ, δn) that

EP [‖ρ(X, θ +h√n

)− ρ(X, θ)‖22q2k,n,(Z)] ≤ K2

ρB2n‖

h√n‖2κρE ≤ K2

ρB2nδ

2κρn . (S.45)

Next, define the class of functions Gn ≡ fqk,n, for some f ∈ Fn, 1 ≤ ≤ J and 1 ≤k ≤ kn, and note that (S.45) implies that for all 1 ≤ ≤ J and P ∈ P

supθ0∈Θr

0n(P )sup

h√n∈Vn(θ0,δn)

max1≤k≤kn,

|Wn,Pρ(·, θ0 +h√n

)qk,n, −Wn,Pρ(·, θ0)qk,n,|

≤ supg1,g2∈Gn:‖g1−g2‖P,2≤KρBnδ

κρn

|Wn,P g1 −Wn,P g2|. (S.46)

Hence, since ‖a‖p ≤ k1/pn ‖a‖∞ for any a ∈ Rkn , result (S.46) allows us to conclude that

supθ0∈Θr

0n(P )sup

h√n∈Vn(θ0,δn)


) ∗ qknn −Wn,Pρ(·, θ0) ∗ qknn ‖p

≤ k1/pn × sup

g1,g2∈Gn:‖g1−g2‖P,2≤KρBnδκρn

|Wn,P g1 −Wn,P g2|. (S.47)

Moreover, Corollary 2.2.8 in van der Vaart and Wellner (1996) additionally implies that

supP∈P

EP [ supg1,g2∈Gn:‖g1−g2‖P,2≤KρBnδ

κρn

|Wn,P g1 −Wn,P g2|]

. supP∈P

∫ KρBnδκρn

0

√logN[ ](ε,Gn, ‖ · ‖P,2)dε

. supP∈P

√log(1 + kn)Bn

∫ Kρδκρn

0

√1 + logN[ ](u,Fn, ‖ · ‖P,2)du, (S.48)

where the second inequality follows from Lemma S.3.3 and the change of variables

u = ε/Bn. However, note that since N[ ](u,Fn, ‖ · ‖P,2) is decreasing in u, it follows that

J[ ](Kρδκρn ,Fn, ‖ · ‖P,2) ≤ KρJ[ ](δ

κρn ,Fn, ‖ · ‖P,2). Therefore, the Lemma follows from

(S.47), ‖Σn‖o,p = OP (1) uniformly in P ∈ P by Lemma S.3.4, and Markov’s inequality

combined with (S.48), the definition of J[ ](ε,Fn, ‖ · ‖P,2), and k1/pn

√log(1 + kn)Bn ×

supP∈P J[ ](δκρn ,Fn, ‖ · ‖P,2) = o(an) by hypothesis.

Lemma S.4.3. Let Assumptions 3.3(i), 3.3(iii), 3.4, and 3.10(ii)-(iii) hold. For any

12

positive sequence δn it then follows that uniformly in P ∈ P0 we have

infθ0∈Θr

0n(P )inf

h√n∈Vn(θ0,δn)

‖Wn,Pρ(·, θ0) ∗ qknn +√nEP [ρ(X, θ0 +

h√n

) ∗ qknn (Z)]‖Σn(P ),p

= infθ0∈Θr

0n(P )inf

h√n∈Vn(θ0,δn)

‖Wn,Pρ(·, θ0)∗qknn +√nEP [ρ(X, θ0+

h√n

)∗qknn (Z)]‖Σn,p+oP (an).

Proof: First note that by Assumption 3.4(iii) there exists a constant C0 < ∞ such

that max‖Σn(P )‖o,p, ‖Σn(P )−1‖o,p ≤ C0 for all P ∈ P. Therefore, since ‖Σna‖p ≤‖ΣnΣn(P )−1‖o,p‖Σn(P )a‖p for any a ∈ Rkn , and ‖ΣnΣn(P )−1‖o,p ≤ ‖Σn(P )−1‖o,p‖Σn−Σn(P )‖o,p + 1 by the triangle inequality, it follows that

C0‖Σn −Σn(P )‖o,p + 1‖Wn,Pρ(·, θ0) ∗ qknn +√nEP [ρ(X, θ0 +

h√n


≥ ‖Wn,Pρ(·, θ0) ∗ qknn +√nEP [ρ(X, θ0 +

h√n

) ∗ qknn (Z)]‖Σn,p (S.49)

for any θ0 ∈ Θr0n(P ) and h/

√n ∈ Vn(θ0, δn). Moreover, ‖Σn(P )‖o,p ≤ C0, 0 ∈ Vn(θ, δn)

for any θ ∈ Θn ∩R, and Assumption 3.10(ii) imply uniformly in P ∈ P that

infθ0∈Θr

0n(P )inf

h√n∈Vn(θ0,δn)


h√n


. supθ∈Θn∩R

‖Wn,Pρ(·, θ) ∗ qknn ‖p + o(an) = OP (k1/pn

√log(1 + kn)BnJn) + o(an) (S.50)

where the final equality holds uniformly in P ∈ P0 by Lemma S.4.4 and Markov’s

inequality. Therefore, results (S.49), (S.50), and Assumption 3.10(iii) imply

infθ0∈Θr

0n(P )inf

h√n∈Vn(θ0,δn)

‖Wn,Pρ(·, θ0)∗qknn +√nEP [ρ(X, θ0+

h√n

)∗qknn (Z)]‖Σn(P ),p+oP (an)

≥ infθ0∈Θr

0n(P )inf

h√n∈Vn(θ0,δn)


h√n

) ∗ qknn (Z)]‖Σn,p

uniformly in P ∈ P0. The reverse to the final inequality follows by identical arguments

but relying on Lemma S.3.4 implying ‖Σn‖o,p = OP (1) and ‖Σ−1n ‖o,p = OP (1) uniformly

in P ∈ P rather than on max‖Σn(P )‖o,p, ‖Σn(P )−1‖o,p ≤ C0, and thus the claim of

the lemma follows.

Lemma S.4.4. If Assumptions 3.3(i) and 3.3(iii) hold, then for some K0 > 0:

supP∈P

EP [ supθ∈Θn∩R

‖Wn,Pρ(·, θ) ∗ qknn ‖p] ≤ K0k1/pn

√log(1 + kn)BnJn.

Proof: Define the class Gn ≡ fqk,n, : f ∈ Fn, 1 ≤ ≤ J , and 1 ≤ k ≤ kn,, and

13

note ‖a‖p ≤ d1/p‖a‖∞ for any a ∈ Rd implies that for any P ∈ P we have

EP [ supθ∈Θn∩R

‖Wn,Pρ(·, θ) ∗ qknn ‖p] ≤ k1/pn EP [ sup

g∈Gn|Wn,P g|]

≤ k1/pn EP [|Wn,P g0|] + C1

∫ ∞0

√logN[ ](ε,Gn, ‖ · ‖P,2)dε, (S.51)

where the final inequality holds for any g0 ∈ Gn and some C1 < ∞ by Corollary 2.2.8

in van der Vaart and Wellner (1996). Next, let Gn ≡ BnFn for Fn as in Assumption

3.3(iii) and note Assumption 3.3(i) implies Gn is an envelope for Gn. Thus, [−Gn, Gn]

is a bracket of size 2‖Gn‖P,2 covering Gn, and as a result we obtain

∫ ∞0

√logN[ ](ε,Gn, ‖ · ‖P,2)dε

≤∫ 2‖Gn‖P,2

0

√1 + logN[ ](ε,Gn, ‖ · ‖P,2)dε ≤ C2

√log(1 + kn)BnJn, (S.52)

where the final inequality holds for some C2 <∞ by result (S.27) and N[ ](u,Gn, ‖ · ‖P,2)

being decreasing in u. Furthermore, since EP [|Wn,P g0|] ≤ ‖g0‖P,2 ≤ ‖Gn‖P,2 we have

EP [|Wn,P g0|] ≤ ‖Gn‖P,2 ≤∫ ‖Gn‖P,2

0

√1 + logN[ ](u,Gn, ‖ · ‖P,2)du. (S.53)

Thus, the claim of the Lemma follows from (S.51), (S.52), and (S.53).

Lemma S.4.5. Let Assumption 3.3(ii) hold. It then follows that there exists a constant

C <∞ such that for all P ∈ P, n ≥ 1, 1 ≤ ≤ J , and functions f ∈ L2P we have

kn,∑k=1

〈f, qk,n,〉2L2P≤ CEP [(EP [f(V )|Z])2]. (S.54)

Proof: Let L2P (Z) denote the subspace of L2

P consisting of functions depending on Z

only, and set `2(N) ≡ ck∞k=1 : ck ∈ R and ‖ck‖`2(N) < ∞, where ‖ck‖2`2(N) ≡∑k c

2k. For any sequence ck ∈ `2(N), then define the map JP,n, : `2(N)→ L2

P (Z) by

JP,n,ck =

kn,∑k=1

ckqk,n,.

Clearly, the maps JP,n, : `2(N) → L2P (Z) are linear, and moreover by Assumption

3.3(ii) there is a C < ∞ such that the largest eigenvalue of EP [qkn,n, (Z)q

kn,n, (Z)

′] is

14

bounded by C for all n ≥ 1 and P ∈ P. Therefore, we obtain

supP∈P

supn≥1‖JP,n,‖2o = sup

P∈Psupn≥1

supck:

∑k c

2k=1

‖JP,n,ck‖2P,2

= supP∈P

supn≥1

supck:

∑k c

2k=1

EP [(

kn,∑k=1

ckqk,n,(Z))2] ≤ sup

ck:∑k c

2k=1

C∞∑k=1

c2k = C (S.55)

which implies JP,n, is continuous. Next, define J∗P,n, : L2P (Z)→ `2(N) to be given by

J∗P,n,g = ak(g)∞k=1 ak(g) ≡

〈g, qk,n,〉L2

P (Z) if k ≤ kn,0 if k > kn,

,

and note J∗P,n, is the adjoint of JP,n,. Therefore, since ‖JP,n,‖o = ‖J∗P,n,‖o by Theorem

6.5.1 in Luenberger (1969), we obtain for any P ∈ P, n ≥ 1, and g ∈ L2P (Z)

kn,∑k=1

〈g, qk,n,〉2L2P (Z)

= ‖J∗P,n,g‖2`2(N) ≤ ‖J∗P,n,‖2o‖g‖2P,2 = ‖JP,n,‖2o‖g‖2P,2. (S.56)

Therefore, since EP [f(V )qk,n,(Z)] = EP [EP [f(V )|Z]qk,n,(Z)] for any f ∈ L2P , setting

g(Z) = EP [f(V )|Z] in (S.56) and employing (S.55) yields the Lemma.

Lemma S.4.6. If Λ is a set, A : Λ→ Rk, B : Λ→ Rk, and W is a k× k matrix, then

| infλ∈Λ‖WA(λ)‖p − inf

λ∈Λ‖WB(λ)‖p| ≤ ‖W‖o,p × sup

λ∈Λ‖A(λ)−B(λ)‖p.

Proof: Fix η > 0, and let λa ∈ Λ satisfy ‖WA(λa)‖p ≤ infλ∈Λ ‖WA(λ)‖p + η. Then,

infλ∈Λ‖WB(λ)‖p − inf

λ∈Λ‖WA(λ)‖p ≤ ‖WB(λa)‖p − ‖WA(λa)‖p + η

≤ ‖WB(λa)−A(λa)‖p + η ≤ ‖W‖o,p × supλ∈Λ‖A(λ)−B(λ)‖p + η (S.57)

where the second result follows from the triangle inequality, and the final result from

‖Wv‖p ≤ ‖W‖o,p‖v‖p for any v ∈ Rk. By identical manipulations we also have

infλ∈Λ‖WA(λ)‖p − inf

λ∈Λ‖WB(λ)‖p ≤ ‖W‖o,p × sup

λ∈Λ‖A(λ)−B(λ)‖p + η. (S.58)

Thus, since η was arbitrary, the lemma follows from results (S.57) and (S.58).

S.5 Bootstrap Approximation

This appendix contains the proof of all results concerning the bootstrap approximation.

We first introduce two assumptions that generalize Assumption 3.16 (at the cost of

15

introducing additional notation), and deliver a stronger version of Theorem 3.3.

Assumption S.5.1. There is an ε > 0 and scalars Dn(L,E) and Dn(B,E) such that

for any P ∈ P, θ0 ∈ Θr0n(P ), and θ1 ∈ Θn ∩ R satisfying ‖θ1 − θ0‖E ≤ ε, there

exists θ0 ∈ Θr0n(P ) such that ‖θ0 − θ0‖E = 0, ‖θ0 − θ1‖L ≤ Dn(L,E)‖θ0 − θ1‖E, and

‖θ0 − θ1‖B ≤ Dn(B,E)‖θ0 − θ1‖E.

Assumption S.5.2. (i) Either ΥF and ΥG are affine or (Rn + νnτn)Dn(B,E) = o(1);

(ii) k1/pn


κρn ∨(νnτn)κρ ,Fn, ‖·‖P,2) = o(an), Km`

2nSn(L,E) =

o(ann− 1

2 ), Km`n(Rn + νnτn)Dn(L,E) = o(ann− 1

2 ), `2nSn(B,E)1Kf > 0 = o(ann− 1

2 ),

and `n(Rn + νnτn)Dn(B,E)1Kf > 0 = o(ann− 1

2 ); (iii) The sequence rn satisfies

lim supn→∞ 1Kg > 0`n/rn < 1/2 and (Rn + νnτn)Dn(B,E) = o(rn).

In particular, note Assumption S.5.1 holds with Dn(L,E) = Sn(L,E), Dn(B,E) =

Sn(B,E), and θ0 = θ0. Hence, Assumption 3.16 implies Assumptions S.5.1 and S.5.2.

In general, however, Dn(L,E) and Dn(B,E) can be smaller than Sn(L,E) and Sn(B,E)

while the introduction of a θ0 6= θ0 eases requirements in partially identified models.

Our next theorem consists of two parts. The first part, which replaces Assumption

3.16 with S.5.1 and S.5.2, can by the preceding discussion be seen as a generalization of

Theorem 3.3. The second part shows that, under additional restrictions, it is possible

to replace the norm ‖ · ‖B in the definition of Vn(θ, `) (as in (22)) with the norm ‖ · ‖E– an observation that is sometimes helpful in easing rate restrictions.

Theorem S.5.1. Let Assumptions 3.1, 3.2, 3.3, 3.4, 3.6, 3.7, 3.8, 3.9, 3.10(i), 3.10(iii),

3.11, 3.12, 3.13, 3.14, 3.15, S.5.1, and S.5.2 hold. Then:

(i) There exists a sequence ñ `n such that uniformly in P ∈ P0 we have

Un(R|`n) ≥ U?n,P (R|ñ) + oP (an).

(ii) If in addition P (θ ∈ Bn :−→d H(θ, Θr

n, ‖ · ‖E) ≤ ε ⊆ Θn) tends to one uniformly

in P ∈ P0 for some ε > 0 and ΥF and ΥG are affine, then part (i) of this theorem

continues to hold with Un(R|`n) defined as in (15) but with Vn(θ, `) given by

Vn(θ, `) ≡ h√n∈ Bn :

h√n∈ Gn(θ), ΥF (θ+

h√n

) = 0, and ‖ h√n‖E ≤ `. (S.59)

Proof: First note Assumption S.5.2(ii) and Lemma S.5.1 imply uniformly in P ∈ P0


n


‖W?n,Pρ(·, θ) ∗ qknn + Dn,P (θ)[h]‖Σn(P ),p + oP (an). (S.60)

Thus, we may select θn ∈ Θrn and hn/

√n ∈ Vn(θn, `n) so that uniformly in P ∈ P0

Un(R|`n) = ‖W?n,Pρ(·, θn) ∗ qknn + Dn,P (θn)[hn]‖Σn(P ),p + oP (an). (S.61)

16

Next note that by Assumptions 3.10(i), S.5.1, and S.5.2 there is a δn so that δnDn(B,E) =

o(rn), δnDn(B,E) = o(1) if either ΥF or ΥG are not affine, Rn + νnτn = o(δn), and

`nδnDn(B,E)1Kf > 0 = o(ann− 1

2 ) (S.62)

Kmδn`nDn(L,E) = o(ann− 1

2 ) (S.63)

k1/pn

√log(1 + kn)Bn × sup

P∈PJ[ ](δ

κρn ,Fn, ‖ · ‖P,2) = o(an). (S.64)

Next, notice that Theorem 3.1 implies that there exist θ0n ∈ Θr0n(P ) such that

‖θn − θ0n‖E = oP (δn) (S.65)

uniformly in P ∈ P0 due to (Rn + νnτn) = o(δn). Furthermore, by Assumption S.5.1

we can assume without loss of generality that θ0n in addition satisfies

‖θn − θ0n‖L = oP (Dn(L,E)δn) ‖θn − θ0n‖B = oP (Dn(B,E)δn) (S.66)

uniformly in P ∈ P0. In addition note that since ‖qk,n,‖∞ ≤ Bn for all 1 ≤ k ≤ kn,

by Assumption 3.3(i), we obtain from Assumption 3.8 together with result (S.65) that

with probability tending to one uniformly in P ∈ P0 we have

EP [‖ρ(X, θn)− ρ(X, θ0n)‖22q2k,n,(Z)] ≤ B2

nK2ρδ

2κρn .

Hence, letting Gn ≡ fqk,n, : f ∈ Fn, 1 ≤ ≤ J , and 1 ≤ k ≤ kn,, we obtain from

‖Σn(P )‖o,p being uniformly bounded by Assumption 3.4(iii), result (S.48), Markov’s

inequality, and δn satisfying (S.64) that uniformly in P ∈ P we have

‖W?n,Pρ(·, θn) ∗ qknn −W?

n,Pρ(·, θ0n) ∗ qknn ‖Σn(P ),p

≤ ‖Σn(P )‖o,pk1/pn sup

g1,g2∈Gn:‖g1−g2‖P,2≤BnKρδκρn

|W?n,P g1 −W?

n,P g2| = oP (an). (S.67)

Similarly, result (S.65) implies that for any ε > 0 we have−→d H(θn,Θr

0n(P ), ‖ · ‖E) ≤ εwith probability tending to one uniformly in P ∈ P0. Therefore, Lemma S.5.3 yields

‖Dn,P (θ0n)[hn]− Dn,P (θn)[hn]‖Σn(P ),p . ‖Σn(P )‖o,p ×Km‖θn − θ0n‖L‖h‖E + oP (an)

. ‖Σn(P )‖o,p ×KmDn(L,E)δn`n√n+ oP (an) = oP (an) (S.68)

where the second inequality follows from ‖hn/√n‖B ≤ `n due to hn/

√n ∈ Vn(θn, `n),

Assumption 3.15(i), and result (S.66). In turn, the final result in (S.68) is implied

by (S.63) and ‖Σn(P )‖o,p being uniformly bounded due to Assumption 3.4(iii). Next,

we note that since either ΥG is affine (implying Kg = 0) or δnDn(B,E) = o(1), and

in addition we have δnDn(B,E) = o(rn) and lim supn→∞ `n/rn1Kg > 0 < 1/2 by

17

Assumption S.5.2(iii), we can conclude that

rn ≥ (MgδnDn(B,E) +Kgδ2nD2

n(B,E)) ∨ 2(`n + δnDn(B,E))1Kg > 0

for n sufficiently large. Hence, Theorem S.7.1(i), Assumption 3.15(ii), and ‖h‖E ≤Kb‖h‖B for all h ∈ Bn and P ∈ P by Assumption 3.15(i), imply that there is an

M <∞ for which with probability tending to one uniformly in P ∈ P0

infh0√n∈Vn(θ0n,2Kb`n)

‖ hn√n− h0√

n‖B ≤M`n(`n + δnDn(B,E))1Kf > 0.

It follows from Assumption S.5.2(ii) and (S.62) that there is a h0n/√n ∈ Vn(θ0n, 2Kb`n)

such that ‖h0n − hn‖B = oP (an) uniformly in P ∈ P0, and hence Assumption 3.4(iii),

Lemma S.5.3, and ‖h‖E ≤ Kb‖h‖B by Assumption 3.15(i) yield

‖Dn,P (θ0n)[hn]− Dn,P (θ0n)[h0n]‖Σn(P ),p . ‖Σn(P )‖o,p × ‖h− h0n‖E = oP (an) (S.69)

uniformly in P ∈ P0. Therefore, combining results (S.61), (S.67), (S.68), and (S.69)

together with θ0n ∈ Θr0n(P ) and h0n/

√n ∈ Vn(θ0n, 2Kb`n) imply

Un(R|`n) = ‖W?n,Pρ(·, θ0n) ∗ qknn + Dn,P (θ0n)[h0n]‖Σn(P ),p + oP (an)

≥ infθ∈Θr

0n(P )inf

h√n∈Vn(θ,2Kb`n)

‖W?n,Pρ(·, θ) ∗ qknn + Dn,P (θ)[h]‖Σn(P ),p + oP (an)

uniformly in P ∈ P0. The first part of theorem then follows by setting ñ = 2Kb`n.

In order to establish the second part of the theorem, note that the only assumptions

that potentially require the norm ‖ · ‖B to be stronger than ‖ · ‖E are Assumptions 3.11,

3.12, 3.13 (pertaining to the differentiability of ΥF and ΥG) and Assumption 3.15(ii)

(since a stronger norm ‖ · ‖B makes (Θrn)ε smaller). We therefore establish part (ii) of

the theorem by repeating the arguments employed in showing part (i) while carefully

re-examining the role played by the norm ‖ · ‖B. To this end, note that since

lim infn→∞

infP∈P0

P (θ ∈ Bn :−→d H(θ, Θn, ‖ · ‖E) ≤ ε ⊆ Θn) = 1, (S.70)

we may apply Lemma S.5.1 with ‖ · ‖B set to equal ‖ · ‖E to still obtain that


n


‖W?n,Pρ(·, θ) ∗ qknn + Dn,P (θ)[h]‖Σn(P ),p + oP (an). (S.71)

Letting θn and hn be defined as in (S.61) (but with Vn(θ, `) as defined in (S.59)), then

observe that since results (S.67) and (S.68) do not rely on Assumptions 3.11, 3.12, 3.13

18

or 3.15(ii), we can conclude from result (S.71) that uniformly in P ∈ P0

Un(R|`n) = ‖W?n,Pρ(·, θ0n) ∗ qknn + Dn,P (θ0n)[hn]‖Σn(P ),p + oP (an) (S.72)

for some θ0n ∈ Θrn(P ). Next, note that rn ≥ MgδnDn(B,E) for n large enough since

δnDn(B,E) = o(rn). Hence, Kg = 0 due to ΥG being affine, together with Theorem

S.7.1(ii) imply that with probability tending to one uniformly in P ∈ P0

Vn(θn, `n) ≡ h√n∈ Bn :

h√n∈ Gn(θn), ΥF (θn +

h√n

) = 0, ‖ h√n|‖E ≤ `n

⊆ h√n∈ Bn : ΥG(θ0n +

h√n

) ≤ 0, ΥF (θ0n +h√n

) = 0, ‖ h√n‖E ≤ `n

⊆ Vn(θ0n, `n),

where the final inequality follows from `n ↓ 0, and results (S.65) and (S.70) implying that

with probability tending to one uniformly in P ∈ P0 we have θ0n + h/√n ∈ Θn for any

h/√n ∈ Bn with ‖h/

√n‖E ≤ `n. Therefore, we can conclude that hn/

√n ∈ Vn(θ0n, `n)

with probability tending to one uniformly in P ∈ P0, which by (S.72) yields

Un(R|`n) ≥ infθ∈Θr

0n(P )inf

h√n∈Vn(θ,`n)

‖W?n,Pρ(·, θ0n) ∗ qknn + Dn,P (θ0n)[h]‖Σn(P ),p + oP (an),

and hence establishes the second claim of the theorem.

Proof of Theorem 3.3: Follows from immediately from Theorem S.5.1(i) and As-

sumption 3.16 implying Assumptions S.5.1 and S.5.2 are satisfied by setting Dn(B,E) =

Sn(B,E), Dn(L,E) = Sn(L,E) and θ0 = θ0.

Proof of Lemma 3.1: In the following arguments, we note that the only requirement

on Dn(θ) is that it satisfy condition (24). As a result, the lemma applies to estimators

Dn(θ) besides the numerical derivative examined in the main text.

In order to establish the result, we first let let θn ∈ Θrn and hn ∈ Vn(θn,+∞) satisfy

infθ∈Θr

n


‖Wnρ(·, θ) ∗ qknn + Dn(θ)[h]‖Σn,p

= ‖Wnρ(·, θn) ∗ qknn + Dn(θn)[hn]‖Σn,p + o(an). (S.73)

Then note that in order to establish the claim of the lemma it suffices to show that

lim supn→∞

supP∈P0

P (‖ hn√n‖B ≥ `n) = 0. (S.74)

To this end, observe that since 0 ∈ Vn(θ,+∞) for all θ ∈ Θn ∩ R, we obtain from the

19

triangle inequality, ‖Σn‖o,p = OP (1) by Lemma S.3.4 and Assumption 3.14 that

‖Dn(θn)[hn]‖Σn,p ≤ ‖Wnρ(·, θn) ∗ qknn + Dn(θn)[hn]‖Σn,p + ‖Wnρ(·, θn) ∗ qknn ‖Σn,p≤ 2‖Σn‖o,p‖W?

n,Pρ(·, θn) ∗ qknn ‖p + oP (an) (S.75)

uniformly in P ∈ P. Hence, since Θrn ⊆ Θn ∩ R almost surely, we obtain from result

(S.75), ‖Σn‖o,p = OP (1), and Lemma S.4.4 together with Markov’s inequality

‖Dn(θn)[hn]‖Σn,p ≤ 2‖Σn‖o,p supθ∈Θn∩R

‖W?n,Pρ(·, θ) ∗ qknn ‖p + oP (an)

= OP (k1/pn

√log(1 + kn)BnJn) (S.76)

uniformly in P ∈ P. Since hn/√n ∈ Vn(θn,+∞) implies hn ∈

√nBn ∩ R − θn and

θn ∈ Θrn ⊆ An(P ) with probability tending to one uniformly in P ∈ P0, we obtain from

the first hypothesis of the lemma that ‖hn‖E ≤ νn‖Dn,P (θn)[hn]‖p with probability

tending to one uniformly in P ∈ P0. Therefore, it follows that

lim supn→∞

supP∈P0

P (`n ≤ ‖hn√n‖B)

= lim supn→∞

supP∈P0

P (`n ≤ ‖hn√n‖B and ‖hn‖E ≤ νn‖Dn,P (θn)[hn]‖p)

≤ lim supn→∞

supP∈P0

P (`n ≤ ‖hn√n‖B and ‖hn‖E ≤ 2νn‖Dn(θn)[hn]‖p), (S.77)

where the inequality follows from (24). Hence, results (S.76) and (S.77), the definitions

of Sn(B,E) and Rn, and Sn(B,E)Rn = o(`n) by hypothesis yield

lim supn→∞

supP∈P0

P (`n ≤ ‖hn√n‖B)

≤ lim supn→∞

supP∈P0

P (`n ≤ 2νn√nSn(B,E)‖Dn(θn)[hn]‖p) = 0, (S.78)

which establishes (S.74) and hence the claim of the Lemma.

Proof of Corollary 3.1: We establish the corollary by appealing to Lemma S.5.4.

To this end, note (S.99) holds by Assumption 3.17 and recall that

U?n,P (R|ñ) ≡ infθ∈Θr

0n(P )inf

h√n∈Vn(θ,ñ)

‖W?n,Pρ(·, θ) ∗ qknn + Dn,P (θ)[h]‖Σn(P ),p,

which is independent of Vini=1 by Assumption 3.14. Moreover, Theorem 3.3 yields

Un(R|`n) ≥ U?n,P (R|ñ) + oP (an)

uniformly in P ∈ P0 with `n ñ, while Assumption 3.16(ii) implies Km

˜2nSn(L,E) =

20

o(ann− 1

2 ) and k1/pn

√log(1 + kn)Bn × supP∈P J[ ](˜κρ

n ,Fn, ‖ · ‖P,2) = o(an), and hence

In(R) ≤ Un,P (R|ñ) + oP (an) (S.79)

uniformly in P ∈ P0 by Theorem 3.2(i). Since the right hand side of (S.79) shares the

same distribution as U?n,P (R|ñ), the corollary holds by Lemma S.5.4.

Proof of Corollary 3.2: First note Theorem 3.2(i) implies that uniformly in P ∈ P0

In(R) ≤ Un,P (R|`n) + oP (an). (S.80)

On the other hand, by applying Theorem 3.2(ii) with R = Θ we also obtain that

In(Θ) = Un,P (Θ|ùn ) + oP (an). (S.81)

The claim of the corollary is therefore immediate from (S.80) and (S.81).

Proof of Corollary 3.3: First note Theorem 3.3 implies that uniformly in P ∈ P0

Un(R|`n) ≥ U?n,P (R|ñ) + oP (an) (S.82)

with ñ `n. Similarly, also note that Lemma S.5.5 implies uniformly in P ∈ P0 that

Un(Θ|ùn) ≤ U?n,P (Θ|ùn) + oP (an), (S.83)

and hence the corollary follows with ˜un = ùn from (S.82) and (S.83).

Proof of Corollary 3.4: The proof of the first claim follows by the same arguments

as in Corollary 3.1 but employing Corollaries 3.2 and 3.3 in place of Theorems 3.2 and

3.3. The proof of the second claim is immediate from Un(Θ|+∞) ≤ Un(Θ|ùn).

Lemma S.5.1. Let Assumptions 3.2, 3.3, 3.4, 3.6, 3.7, 3.8, 3.9(i), 3.10(ii)(iii), 3.14,

3.15 hold, and suppose `n ↓ 0 satisfies k1/pn


κρn ,Fn, ‖·‖P,2) =

o(an) and Km`2nSn(L,E) = o(ann

− 12 ). It then follows that uniformly in P ∈ P0 we have


n


‖W?n,Pρ(·, θ) ∗ qknn + Dn,P (θ)[h]‖Σn(P ),p + oP (an).

Proof: First note Assumption 3.10(ii) implies Assumption 3.5, and hence Assumptions

3.10(ii) and 3.6 allow us to apply Theorem 3.1(i) to conclude−→d H(Θr

n,Θr0n, ‖ · ‖E) =

OP (Rn + νnτn) uniformly in P ∈ P0. Thus, for any ε > 0 it follows that

lim infn→∞

infP∈P0

P (Θrn ⊆ θ ∈ Θn ∩R :

−→d H(θ,Θr

0n(P ), ‖ · ‖E) ≤ ε) = 1. (S.84)

Furthermore, for any θ ∈ Θrn and h/

√n ∈ Vn(θ, `n) note that ΥG(θ + h/

√n) ≤ 0 and

21

ΥF (θ + h/√n) = 0 by definition of Vn(θ, `n). Thus, θ + h/

√n ∈ R for any θ ∈ Θr

n and

h/√n ∈ Vn(θ, `n), and hence Assumption 3.15(ii) allows us to conclude

lim infn→∞

infP∈P0

P (θ +h√n∈ Θn ∩R for all θ ∈ Θr

n andh√n∈ Vn(θ, `n))

= lim infn→∞

infP∈P0

P (θ +h√n∈ Θn for all θ ∈ Θr

n andh√n∈ Vn(θ, `n)) = 1 (S.85)

due to ‖h/√n‖B ≤ `n ↓ 0 for any h/

√n ∈ Vn(θ, `n). Therefore, results (S.84) and

(S.85), Assumption 3.15(i), and Lemma S.5.2 yield uniformly in P ∈ P0

supθ∈Θr

n

suph√n∈Vn(θ,`n)

‖Dn(θ)[h]− Dn,P (θ)[h]‖p = oP (an). (S.86)

Moreover, since Θrn ⊆ Θn ∩R almost surely, we also have from Assumption 3.14 that

supθ∈Θr

n

‖Wnρ(·, θ) ∗ qknn −W?n,Pρ(·, θ) ∗ qknn ‖p = oP (an) (S.87)

uniformly in P ∈ P. Therefore, since ‖Σn‖o,p = OP (1) uniformly in P ∈ P by Lemma

S.3.4, we obtain from results (S.86) and (S.87) and Lemma S.4.6 that


n


‖W?n,Pρ(·, θ) ∗ qknn + Dn,P (θ)[h]‖Σn,p + oP (an) (S.88)

uniformly in P ∈ P0. Next, note that by Assumption 3.4(iii) there exists a constant

C0 <∞ such that ‖Σn(P )−1‖o,p ≤ C0 for all n and P ∈ P. Thus, using that ‖Σna‖p ≤‖ΣnΣn(P )−1‖o,p‖Σn(P )a‖p for any a ∈ Rkn and the triangle inequality we obtain

‖W?n,Pρ(·, θ) ∗ qknn + Dn,P (θ)[h]‖Σn,p

≤ C0‖Σn − Σn(P )‖o,p + 1‖W?n,Pρ(·, θ) ∗ qknn + Dn,P (θ)[h]‖Σn(P ),p (S.89)

for any θ ∈ Θn ∩ R, h ∈ Bn, and P ∈ P. In particular, since 0 ∈ Vn(θ, `n) for any

θ ∈ Θn ∩R, Assumptions 3.4(iii), 3.10(iii), Markov’s inequality, and Lemma S.4.4 yield

‖Σn − Σn(P )‖o,p × infθ∈Θr

n



≤ ‖Σn − Σn(P )‖o,p × supθ∈Θn∩R

‖W?n,Pρ(·, θ) ∗ qknn ‖Σn(P ),p = oP (an) (S.90)

22

uniformly in P ∈ P. It then follows from (S.89) and (S.90) that uniformly in P ∈ P

infθ∈Θr

n



≤ infθ∈Θr

n


‖W?n,Pρ(·, θ) ∗ qknn + Dn,P (θ)[h]‖Σn(P ),p + oP (an) . (S.91)

The reverse inequality to (S.91) can be obtained by identical arguments and exploiting

max‖Σn‖o,p, ‖Σ−1n ‖o,p = OP (1) uniformly in P ∈ P by Lemma S.3.4. The claim of

the Lemma then follows from (S.88) and (S.91) (and its reverse inequality).

Lemma S.5.2. Let Assumptions 3.3(i)-(ii), 3.7, 3.8, 3.9(i) hold, and define

Vn(θ, `n) ≡ h√n∈ Bn : θ +

h√n∈ Θn ∩R and ‖ h√

n‖E ≤ `n. (S.92)

If `n ↓ 0 satisfies k1/pn

√log(1 + kn)Bn×supP∈P J[ ](`

κρn ,Fn, ‖·‖P,2) = o(an) and Km`

2n×

Sn(L,E) = o(ann− 1

2 ), then there is an ε > 0 such that uniformly in P ∈ P

supθ∈Θn∩R:

−→d H(θ,Θr

0n(P ),‖·‖E)≤εsup

h√n∈Vn(θ,`n)

‖Dn(θ)[h]− Dn,P (θ)[h]‖p = oP (an). (S.93)

Proof: By definition of the set Vn(θ, `n), we have θ+h/√n ∈ Θn∩R for any θ ∈ Θn∩R,

h/√n ∈ Vn(θ, `n). Therefore, since ‖h/

√n‖E ≤ `n for all h/

√n ∈ Vn(θ, `n) we obtain

supθ∈Θn∩R


‖Dn(θ)[h]−√nEP [(ρ(X, θ +

h√n

)− ρ(X, θ)) ∗ qknn (Z)]‖p

≤ supθ1,θ2∈Θn∩R:‖θ1−θ2‖E≤`n

‖Gn,Pρ(·, θ1) ∗ qknn −Gn,Pρ(·, θ2) ∗ qknn ‖p. (S.94)

Further we note that Assumptions 3.3(i) and 3.8 additionally allow us to conclude

supP∈P

supθ1,θ2∈Θn∩R:‖θ1−θ2‖E≤`n

EP [‖ρ(X, θ1)− ρ(X, θ2)‖22q2k,n,(Z)] ≤ B2

nK2ρ`

2κρn . (S.95)

Next, let Gn ≡ fqk,n, : f ∈ Fn, 1 ≤ ≤ J and 1 ≤ k ≤ kn,, and then observe that

Assumption 3.7, result (S.95) and ‖a‖p ≤ k1/pn ‖a‖∞ for any a ∈ Rkn yield

supθ1,θ2∈Θn∩R:‖θ1−θ2‖E≤`n

‖Gn,Pρ(·, θ1) ∗ qknn −Gn,Pρ(·, θ2) ∗ qknn ‖p

≤ 2k1/pn × sup

g1,g2∈Gn:‖g1−g2‖P,2≤BnKρ`κρn

|Wn,P g1 −Wn,P g2|+ oP (an) (S.96)

uniformly in P ∈ P. Therefore, from the calculations in (S.48), Markov’s inequality, and

23

k1/pn


κρn ,Fn, ‖ · ‖P,2) = o(an) by hypothesis, we conclude

supθ∈Θn∩R


‖Dn(θ)[h]−√nEP [(ρ(X, θ +

h√n

)− ρ(X, θ)) ∗ qknn (Z)]‖p = oP (an)

(S.97)

uniformly in P ∈ P. Let Nn(P ) ≡ θ ∈ Θn ∩ R :−→d H(θ,Θr

0n(P ), ‖ · ‖E) ≤ ε, and

select ε sufficiently small for Assumption 3.9(i) to hold. We can then conclude from

Lemmas S.3.5 and S.4.5, and Assumption 3.9(i) that

supθ∈Nn(P )


‖√nEP [(ρ(X, θ +

h√n

)− ρ(X, θ)) ∗ qknn (Z)]− Dn,P (θ)[h]‖p

. supθ∈Nn(P )


Km ×√n‖ h√

n‖E‖

h√n‖L = o(an), (S.98)

where the final equality follows from Km`2n×Sn(L,E) = o(ann

− 12 ) by hypothesis. Hence,

the Lemma follows from results (S.97) and (S.98).

Lemma S.5.3. Let Assumptions 3.3(ii) and 3.9(ii)-(iii) hold. Then there are constants

ε > 0 and C < ∞ such that for all n, P ∈ P, θ0 ∈ Θr0n(P ), θ1 ∈ Θn ∩ R satisfying

−→d H(θ1,Θr

0n(P ), ‖ · ‖E) ≤ ε, and h0, h1 ∈ Bn it follows that

‖Dn,P (θ0)[h0]− Dn,P (θ1)[h1]‖p ≤ CMm‖h0 − h1‖E +Km‖θ0 − θ1‖L‖h1‖E.

Proof: We first fix ε > 0 such that Assumptions 3.9(ii)-(iii) are satisfied. Then note

that by Lemmas S.3.5 and S.4.5 it follows that there is a constant C0 <∞ with

‖Dn,P (θ0)[h0]− Dn,P (θ1)[h1]‖p ≤ J∑=1

C0‖∇mP,(θ0)[h0]−∇mP,(θ1)[h1]‖2P,212 .

Moreover, since (h0−h1) ∈ Bn, we can also conclude from Assumptions 3.9(ii)-(iii) that

‖∇mP,(θ0)[h0]−∇mP,(θ1)[h1]‖P,2≤ ‖∇mP,(θ0)[h0 − h1]‖P,2 + ‖∇mP,(θ0)[h1]−∇mP,(θ1)[h1]‖P,2≤Mm‖h0 − h1‖E +Km‖θ1 − θ0‖L‖h1‖E,

and thus the claim of the lemma follows.

Lemma S.5.4. Suppose there exists a δ > 0 such that for all ε > 0 and α ∈ [α−δ, α+δ]

supP∈P0

P (q1−α,P (In(R))− ε ≤ In(R) ≤ q1−α,P (In(R)) + ε) ≤ a−1n (ε ∧ 1) + o(1). (S.99)

If In(R) ≤ Un,P (R|ñ) + oP (an) and Un(R|`n) ≥ U?n,P (R|ñ) + oP (an) uniformly in

24

P ∈ P0 with Un,P (R|ñ)d= U?n,P (R|ñ) and U?n,P (R|ñ) independent of Vini=1, then

lim supn→∞

supP∈P0

P (In(R) > q1−α(Un(R|`n))) ≤ α.

Proof: We first note that by hypothesis there exists a positive sequence bn such that

bn = o(an) and in addition we have uniformly in P ∈ P0 that

In(R) ≤ Un,P (R|ñ) + oP (bn) Un(R|`n) ≥ U?n,P (R|ñ) + oP (bn). (S.100)

Next, observe that by Markov’s inequality and result (S.100) we can conclude that

lim supn→∞

supP∈P0

P (P (U?n,P (R|ñ) > Un(R|`n) + bn|Vini=1) > ε)

≤ lim supn→∞

supP∈P0

1

εP (U?n,P (R|ñ) > Un(R|`n) + bn) = 0. (S.101)

Thus, it follows from (S.101) that there exists some sequence ηn ↓ 0 such that the event

Ωn(P ) ≡ Vini=1|P (U?n,P (R|ñ) > Un(R|`n) + bn|Vini=1) ≤ ηn

satisfies P (Ωn(P )c) = o(1) uniformly in P ∈ P0. Hence, for any t ∈ R we obtain that

P (Un(R|`n) ≤ t|Vini=1)1Vini=1 ∈ Ωn(P )

≤ P (Un(R|`n) ≤ t and U?n,P (R|ñ) ≤ Un(R|`n) + bn|Vini=1) + ηn

≤ P (U?n,P (R|ñ) ≤ t+ bn) + ηn, (S.102)

where the final inequality employed that U?n,P (R|ñ) is independent of Vini=1. Set

q1−α,P (Un,P (R|ñ)) ≡ infu : P (Un,P (R|ñ) ≤ u) ≥ 1− α

and note evaluating (S.102) at t = q1−α(Un(R|`n)) yields that Ωn(P ) implies q1−α(Un(R|`n))+

bn ≥ q1−α−ηn,P (Un,P (R|ñ)). Hence, P (Ωn(P )c) = o(1) uniformly in P ∈ P0 yields

lim infn→∞

infP∈P0

P (q1−α−ηn,P (Un,P (R|ñ)) ≤ q1−α(Un(R|`n)) + bn)

≥ lim infn→∞

infP∈P0

P (Ωn(P )) = 1. (S.103)

Furthermore, arguing as in (S.102) it follows that for some sequence ηn = o(1) we have

q1−α−ηn,P (In(R)) ≤ q1−α−ηn,P (Un,P (R|ñ)) + bn. (S.104)

25

Thus, employing (S.103), (S.104), condition (S.99), and bn = o(an), we conclude that

lim supn→∞

supP∈P0

P (In(R) > q1−α(Un(R|`n)))

≤ lim supn→∞

supP∈P0

P (In(R) > q1−α−ηn,P (Un,P (R|ñ))− bn)

≤ lim supn→∞

supP∈P0

P (In(R) > q1−α−ηn,P (In(R))− 2bn) = α,

which establishes the claim of the Lemma.

Lemma S.5.5. Let the conditions of Theorem 3.3 hold with R = Θ, and suppose

JunBnk

1/pn


n ). Then, it follows uniformly in P ∈ P0 that

Un(Θ|ùn) ≤ U?n,P (Θ|ùn) + oP (an).

Proof: Since we assumed the condition of Theorem 3.3 hold with R = Θ, we can apply

Lemma S.5.1 with Θun and V u

n (ùn) in place of Θrn and Vn(θ, `n) to conclude

Un(Θ|ùn) = infθ∈Θu

n

infh√n∈V u

n (ùn)‖W?

n,Pρ(·, θ) ∗ qknn + Dn,P (θ)[h]‖Σn(P ),p + oP (an) (S.105)

uniformly in P ∈ P0. Next, let θ0n ∈ Θu0n(P ) and hn/

√n ∈ V u

n (ùn) be such that

infθ∈Θu

0n(P )inf

h√n∈V u

n (ùn)‖W?

n,Pρ(·, θ) ∗ qknn + Dn,P (θ)[h]‖Σn(P ),p

= ‖W?n,Pρ(·, θ0n) ∗ qknn + Dn,P (θ0n)[hn]‖Σn(P ),p + o(an). (S.106)

Assumption 3.16 holding with R = Θ implies Kmùn((νu

nτun ) ∨ ùn)Su

n(L,E) = o(ann−1/2)

and k1/pn

√log(1 + kn)Bn supP∈P J[ ]((`

un)κρ ,Fn, ‖ · ‖P,2) = o(an). Hence, there is δn with

KmδnùnSu

n(L,E) = o(ann−1/2) (S.107)

k1/pn

√log(1 + kn)Bn × sup

P∈PJ[ ](δ

κρn ,Fu

n , ‖ · ‖P,2) = o(an), (S.108)

and νunτ

un = o(δn). Further note that since we required Ju

nBnk1/pn


n )

and we assumed all other conditions of Theorem 3.1(ii) are satisfied when Θ = R, it

follows that we may find a θn in Θun such that uniformly in P ∈ P0

‖θ0n − θn‖E = OP (νunτ

un ). (S.109)

Thus, νunτ

un = o(δn) and θn ∈ Θu

n ⊆ Θn implies that (θn − θ0n) ∈ V un (θ0n, δn) with

probability tending to one uniformly in P ∈ P0. Hence, applying Lemma S.4.2 with

26

Θu0n(P ) and V u

n (θ0, δn) in place of Θr0n(P ) and Vn(θ0, δn), yields uniformly in P ∈ P0

‖W?n,Pρ(·, θn) ∗ qknn −W?

n,Pρ(·, θ0n) ∗ qknn ‖p = oP (an). (S.110)

Furthermore, Lemma S.5.3 implies with probability tending to one uniformly in P ∈ P0

‖Dn,P (θn)[hn]− Dn,P (θ0n)[hn]‖p ≤ KmSun(L,E)δn`

un

√n = o(an) (S.111)

where the final equality follows from (S.107). Therefore, ‖Σn(P )‖o,p being bounded

uniformly in P ∈ P by Assumption 3.4(iii), θn ∈ Θun and hn/

√n ∈ V u

n (ùn) together with

results (S.106), (S.110), and (S.111) establish that uniformly in P ∈ P0 we have

infθ∈Θu

0n(P )inf

h√n∈V u

n (ùn)‖W?

n,Pρ(·, θ) ∗ qknn + Dn,P (θ)[h]‖Σn(P ),p

≥ infθ∈Θu

n

infh√n∈V u

n (ùn)‖W?

n,Pρ(·, θ) ∗ qknn + Dn,P (θ)[h]‖Σn(P ),p + oP (an). (S.112)

Hence, the corollary follows from V un (θ, ùn) ⊆ V u

n (ùn), (S.105), and (S.112).

S.6 Illustrative Examples

We next examine special cases of our general analysis and illustrate both how to imple-

ment our procedure and verify the assumptions in the main text.

S.6.1 Generalized Method of Moments

Our first example concerns the generalized method of moments (GMM) model of Hansen

(1982). For simplicity we assume the parameter of interest θ0(P ) ∈ Θ ⊆ Rdθ is identified

as the unique solution to the unconditional moment restriction

EP [ρ(X, θ0(P ))] = 0, (S.113)

where X ∈ Rdx is distributed according to P ∈ P and ρ : Rdx ×Θ→ RJ . This model

maps into our general analysis by letting Z = 1 for all 1 ≤ ≤ J . Moreover, since we

have assumed θ0(P ) is identified, the hypothesis testing problem simplifies to

H0 : θ0(P ) ∈ R H1 : θ0(P ) /∈ R.

The set R is as in the main text defined by equality and inequality restrictions. In

27

particular, for known functions ΥF : Rdθ → RdF and ΥG : Rdθ → RdG we set

R ≡ θ ∈ Rdθ : ΥF (θ) = 0 and ΥG(θ) ≤ 0. (S.114)

To verify Assumption 3.1, note Rd is a Banach space under any norm ‖ · ‖p with 1 ≤p ≤ ∞, so for concreteness we set B = Rdθ , F = RdF , and ‖ · ‖B = ‖ · ‖F = ‖ · ‖2. The

space Rd is in addition a lattice under the standard pointwise partial order

a ≤ b if and only if ai ≤ bi for all 1 ≤ i ≤ d (S.115)

for any (a1, . . . , ad)′ = a and (b1, . . . , bd)

′ = b in Rd. The least upper bound a ∨ b and

smallest lower bound a∧ b are then given by the pointwise maximum and minimum; i.e.

a ∨ b = (maxa1, b1, . . .maxad, bd)′

a ∧ b = (mina1, b1, . . .minad, bd)′.

The vector (1, . . . , 1)′ is an order unit in Rd under the partial order in (S.115). As

discussed in the appendix to the main paper, the order unit induces the norm

inf λ > 0 : |a| ≤ λ(1, . . . , 1)′ = max1≤i≤d

|ai|,

which corresponds to the usual ‖ · ‖∞ norm on Rd. Hence, by setting G = RdG ,

‖ · ‖G = ‖ · ‖∞, and 1G = (1, . . . , 1)′ we verify the requirements of Assumption 3.1.

Since the parameter space Θ is finite dimensional and all moment restrictions are

unconditional, we may set Θn = Θ and kn = J for all n. We base our test statistic on

quadratic forms in the moments (p = 2), which implies Qn(θ) is given by

Qn(θ) ≡ ( 1

n

n∑i=1

ρ(Xi, θ))′Σn(

1

n

n∑i=1

ρ(Xi, θ))12 .

In what follows, we considered tests based on the un-centered statistic In(R) and the

re-centered statistic In(R)− In(Θ). To this end, we impose the following assumptions.

Assumption S.6.1. (i) Xini=1 is an i.i.d. sequence with Xi ∼ P ∈ P; (ii) For each

P ∈ P0 there exists a unique θ0(P ) ∈ Θ solving (S.113); (iii) Θ is convex and compact.

Assumption S.6.2. (i) ρ(x, ·) : Θ → RJ is twice differentiable for all x ∈ Rdx; (ii)

EP [supθ∈Θ ‖ρ(X, θ)‖32], EP [supθ∈Θ ‖∇θρ(X, θ)‖2o,2], and EP [supθ∈Θ ‖∇2θρ(X, θ)‖

1+δo,2 ] are

finite and bounded uniformly in P ∈ P for some δ > 0.

Assumption S.6.3. (i) infP∈P0 infθ∈Θ:‖θ−θ0(P )‖2≥ε ‖EP [ρ(X, θ)]‖2 > 0 for all ε > 0;

(ii) The singular values of EP [∇θρ(X, θ0(P ))] are bounded away from zero in P ∈ P0.

Assumption S.6.4. (i) ‖Σn − Σ(P )‖o,2 = OP (n−1/2) uniformly in P ∈ P; (ii) Σ(P )

is invertible and ‖Σ(P )‖o,2 and ‖Σ(P )−1‖o,2 are bounded uniformly in P ∈ P.

28

In Assumption S.6.2 we focus on differentiable moments for simplicity. Assumption

S.6.3 essentially imposes strong identification of θ0(P ) and hence guarantees that θ0(P )

can be consistently estimated uniformly in P ∈ P0. Finally, Assumption S.6.4 states

the requirements on the weighting matrix Σn. The consistency rate in Assumption S.6.4

is imposed for simplicity since it is easily satisfied in this finite dimensional model.

In what follows, we set the local parameter spaces Vn(θ, `) and V un (θ, `) to equal

Vn(θ, `) = h/√n ∈ Rdθ : θ + h/

√n ∈ Θ ∩R and ‖h/

√n‖2 ≤ `

V un (θ, `) = h/

√n ∈ Rdθ : θ + h/

√n ∈ Θ and ‖h/

√n‖2 ≤ `.

We also denote the random variables to which In(R) and In(Θ) will be coupled to by

Un,P (R|`n) ≡ infh√n∈Vn(θ0(P ),`n)

‖Wn,Pρ(·, θ0(P )) + EP [∇θρ(X, θ0(P ))]h‖Σ(P ),2 (S.116)

Un,P (Θ|`n) ≡ infh√n∈V u

n (θ0(P ),`n)‖Wn,Pρ(·, θ0(P )) + EP [∇θρ(X, θ0(P ))]h‖Σ(P ),2 (S.117)

Our distributional approximations then follow immediately from Theorem 3.2(ii).

Theorem S.6.1. Let Assumptions S.6.1, S.6.2, S.6.3, and S.6.4 hold, ΥF and ΥG

be continuous, and set an =√

log(n)/n1

10+5dθ . Then: For any `n, ùn ↓ 0 satisfying

(`n ∨ ùn)√

log(1/`n ∨ ùn) = o(an) and n−1/2 = o(`n ∨ ùn) we have uniformly in P ∈ P0



The rate of coupling of√

log(n)/n1

10+5dθ obtained in Theorem S.6.1 suffices for both

the empirical process and bootstrap coupling; see Lemmas S.6.3 and S.6.4 below. While

the rate is adequate for our purposes, it can be potentially improved, for example, under

additional moment restrictions. Here, we rely in Yurinskii (1977) both to illustrate the

diversity of coupling arguments that can be employed to verify Assumption 3.7, and to

impose only the weak third moment restriction of Assumption S.6.2(ii).

As in Theorem 3.2, the conclusion in Theorem S.6.1 in fact delivers a family (indexed

by `n) of approximations to the distribution of both In(R) and In(R)−In(Θ). Our next

goal is therefore to obtain bootstrap approximations to the distributional approxima-

tions obtained in Theorem S.6.1. To this end, we write ΥF (θ) = (ΥF,1(θ), . . . ,ΥF,dF (θ))′

and ΥG(θ) = (ΥG,1(θ), . . . ,ΥG,dG(θ))′, and introduce the following assumptions.

Assumption S.6.5. For some ε > 0 and Bε ≡⋃P∈P0

θ : ‖θ− θ0(P )‖2 ≤ ε: (i) Bε ⊆Θ; (ii) ΥF and ΥG are twice differentiable on Bε; (iii) ‖∇ΥF (θ)‖o,2 and ‖∇ΥG(θ)‖o,2 are

29

bounded on Bε; (iv) ‖∇2ΥF,j(θ)‖o,2 is bounded on Bε for 1 ≤ j ≤ dF ; (v) ‖∇2ΥG,j(θ)‖o,2is bounded on Bε for 1 ≤ j ≤ dG; (vi) ∇ΥF (θ) has full row-rank on Bε.

Assumption S.6.6. Either (i) ΥF : Rdθ → RdF is linear, or (ii) There is an ε > 0 and

Kd <∞ such that the singular values of ∇ΥF (θ1)′ are bounded away from zero uniformly

in θ1 ∈ θ : ‖θ−θ1(P )‖2 ≤ ε and P ∈ P0, and there is an h(P ) ∈ N (∇ΥF (θ0(P ))) with

‖h(P )‖2 ≤ Kd satisfying ΥG,j(θ0(P )) +∇ΥG,j(θ0(P ))[h0(P )] ≤ −ε for all 1 ≤ j ≤ dG.

In order to describe the structure of the bootstrap procedure in this application let

Qn(θn) ≤ infθ∈Θ∩R

Qn(θ) + oP (n−1/2)

Qn(θun) ≤ inf

θ∈ΘQn(θ) + oP (n−1/2)

uniformly in P ∈ P; e.g. θn and θun may be set to equal any minimizer of Qn over

Θ∩R and Θ when such minimizers exist. Employing θn and θun we then obtain estima-

tors for the distribution of the random variable Wn,Pρ(·, θ0(P )) and for the derivative

EP [∇θρ(X, θ0(P ))] by evaluating the following expressions at both θ = θn and θ = θun:

Wnρ(·, θ) ≡ 1√n

n∑i=1

ωiρ(Xi, θ)−1

n

n∑j=1

ρ(Xj , θ) (S.118)

Dn(θ) ≡ 1

n

n∑i=1

∇θρ(Xi, θ), (S.119)

where ωini=1 is an i.i.d. sample of standard normal random variables independent

of the the data Xini=1. We note that since moments are assumed differentiable, we

employ an analytical derivative in (S.119) for the resulting computational simplicity.

With regards to the local parameter space, we note that the construction of Vn(θ, `)

requires the bound Kg on the second derivative of ΥG (as specified in Assumption 3.11).

In particular, Assumption S.6.5(v) implies Assumption 3.11 is satisfied with

Kg ≡ max1≤j≤J

supθ∈Bε

‖∇2θΥG,j(θ)‖o,2

(see Lemma S.6.5). If an a-priori bound on the second derivative is not available, then

it is also possible to simply substitute Kg with the data driven choice

Kg ≡ max1≤j≤J

supθ∈Θ:‖θ−θn‖2≤rn

‖∇2θΥG,j(θ)‖o,2.

Given Kg (or sample analogue Kg), we then note the set Gn(θ) in this application equals

Gn(θ) = h√n∈ Rdθ : Υj,G(θ+

h√n

) ≤ maxΥj,G(θ)−Kgrn‖h√n‖2,−rn for 1 ≤ j ≤ dG

30

Furthermore, in this parametric problem we may additionally specify the bandwidth `n

to be infinite. Hence, the sample analogue to the local parameter space is given by

Vn(θ,+∞) = h√n∈ Rdθ :

h√n∈ Gn(θ) and ΥF (θ +

h√n

) = 0.

The approximations to the distributions of In(R) and In(Θ) are then given by the

laws of Un(R|+∞) and Un(Θ|+∞) conditional on the data, where

Un(R|+∞) ≡ infh∈Vn(θn,+∞)

‖Wnρ(·, θn) + Dn(θn)h‖Σn,2

Un(Θ|+∞) ≡ infh∈Rdθ

‖Wnρ(·, θun) + Dn(θu

n)h‖Σn,2.

The validity of these approximations then follows from Theorem 3.3 and Corollary 3.3.

Theorem S.6.2. Let Assumptions S.6.1, S.6.2, S.6.3, S.6.4, S.6.5, and S.6.6 hold,

set an =√

log(n)/n1

10+5dθ , and let n−1/2 = o(rn). Then: for any sequences `n, ùn ↓ 0

satisfying (`n ∨ ùn)2√

log(1/(`n ∨ ùn)) = o(ann− 1

2 ), `n = o(rn), and n−12 = o(`n ∧ ùn), it

follows that uniformly in P ∈ P0 we have

Un(R|+∞) ≥ U?n,P (R|`n) + oP (an)


Crucially, note that any sequences `n, ùn satisfying the conditions of Theorem S.6.2

also satisfies the conditions of Theorem S.6.1. Therefore, Theorems S.6.2 and S.6.1

together establish the validity of employing the laws of Un(R| +∞) and Un(Θ| +∞)

conditional on the data to approximate the laws of In(R) and In(Θ). In particular, for

a level α test we may compare the test statistic In(R) to the critical value

q1−α(Un(R|+∞)) ≡ infc : P (Un(R|+∞) ≤ c|Xini=1) ≥ 1− α.

Similarly, for the re-centered statistic In(R)− In(Θ), valid critical values are given by:

q(Un(R|+∞)−Un(Θ|+∞)) ≡ infc : P (Un(R|+∞)−Un(Θ|+∞) ≤ c|Xini=1) ≥ 1−α.

These approximations are valid under the requirement that rn satisfies rn√n→∞.

Intuitively, the bandwidth rn is meant to reflect a bound on the distance between θn

and θ0(P ). For a data driven choice of rn we therefore suggest employing a bootstrap

estimate of an upper quantile of the distribution of the unconstrained estimator. Specif-

ically, for θu?n the bootstrapped version of θu

n, we let rn be given by

rn ≡ infc : P (‖θu?n − θu

n‖2 ≤ c|Xini=1) ≥ 1− γn

31

for γn → 0 as the sample size n tends to infinity, and employ rn in place of rn.

S.6.1.1 Proofs and Auxiliary Results

Proof of Theorem S.6.1: We establish the theorem by simply applying Theorem

3.2(ii) to both R as in (S.114) (to couple In(R)) and to R = Θ (to couple In(Θ)). To

this end, note that as discussed Assumption 3.1 holds, while Assumption 3.2 is directly

imposed in S.6.1(i). Since qknn (Z) equals the vector (1, . . . , 1)′ ∈ RJ , it further follows

Assumption 3.3(i) holds with Bn = 1, while Assumption 3.3(ii) is automatically satisfied.

We further note that Assumption 3.3(iii) holds for R = Θ (and hence also for R as in

(S.114)) with Jn = C0 for some C0 < ∞ by Assumption S.6.2(ii) and Lemma S.6.2.

Assumption 3.4 is implied by Assumption S.6.4. Lemma S.6.1 additionally verifies that

Assumption 3.6 holds with ‖ · ‖E = ‖ · ‖2 and ν−1n = η for some η > 0 when R = Θ

and hence also when R as in (S.114). Since Assumption 3.5 holds automatically due to

EP [ρ(X, θ0(P ))] = 0, we note Theorem 3.1 implies Rn n−1/2 for both choices of R

under consideration. Also note Assumption 3.7 is satisfied for R = Θ, and hence also

for R as in (S.114), by Lemma S.6.3. Additionally, since Θ is convex by Assumption

S.6.1(iii), the mean value theorem and Assumption S.6.2(ii) imply

EP [‖ρ(X, θ1)− ρ(X, θ2)‖22] ≤ EP [supθ∈Θ‖∇θρ(X, θ)‖2o,2]‖θ1 − θ2‖22

for all θ1, θ2 ∈ Θ, which verifies Assumption 3.8 holds with κρ = 1. To verify Assumption

3.9, note that in this application ∇mP,(θ) = EP [∇θρ(X, θ)]. Hence, Assumptions

3.9(i)(ii) hold with ‖ · ‖L = ‖ · ‖2 due to EP [supθ∈Θ ‖∇2θρ(X, θ)‖o,2] being bounded

in P ∈ P by Assumption S.6.2(ii). Similarly, Assumption 3.9(iii) is satisfied due to

EP [supθ∈Θ ‖∇θρ(X, θ)‖o,2] being bounded by Assumption S.6.2(ii). Finally, we note that

since Rn n−1/2 and κρ = 1, Lemma S.6.2 verifies Assumption 3.10(i). Assumption

3.10(ii) is immediate since EP [ρ(X, θ0(P ))] = 0, and Assumption 3.10(iii) holds by

Assumption S.6.4(i). To conclude, simply note that the condition k1/pn

√log(1 + kn)Bn×

supP∈P J[ ](`κρn ,Fn, ‖·‖P,2) = o(an) is implied by `n

√log(1/`n) = o(an) by Lemma S.6.2,

and KmR2n = o(an/

√n) is implied by n−1/2 = o(an).

Proof of Theorem S.6.2: We first define En(R|`n) and En(Θ|ùn) to equal

En(R|`n) ≡ infh√n∈Vn(θn,`n)

‖Wnρ(·, θn) + Dn(θn)h‖Σn,2

En(Θ|ùn) ≡ inf‖ h√

n‖2≤ùn


n)h‖Σn,2,

and note Lemma S.6.6 implies Un(R| +∞) = En(R|`n) + oP (an) and Un(Θ| +∞) =

En(Θ|ùn) + oP (an) uniformly in P ∈ P0 for any `n, ùn ↓ 0 satisfying the conditions of

32

the theorem. Therefore, to establish the theorem it suffices to establish

En(R|`n) ≥ U?n,P (R|ñ) + oP (an)

En(R|`n)− En(Θ|ùn) ≥ U?n,P (R|ñ)− U?n,P (Θ|˜un) + oP (an)

uniformly in P ∈ P0 for `n ñ and ùn ˜u

n. To this end we rely on Theorem 3.3

(for En(R|`n)) and Corollary 3.3 (for En(R|`n)− En(Θ|ùn)). We note that in the proof

of Theorem S.6.1 we established that Assumptions S.6.1, S.6.2, S.6.3, and S.6.4, imply

Assumptions 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, and 3.10 hold with Rn n−12 ,

νn 1, ‖ · ‖B = ‖ · ‖E = ‖ · ‖L = ‖ · ‖2, κρ = 1, and an =√

log(n)/n1

10+5dθ for R = Θ and

R as in (S.114). We thus avoid repeating the arguments, and verify only Assumptions

3.11, 3.12, 3.13, 3.14, 3.15, and 3.16 for R = Θ and R as in (S.114).

Next note that Lemma S.6.5 implies Assumptions 3.11, 3.12, and 3.13 are satisfied,

while Lemma S.6.4 verifies Assumption 3.14 with an =√

log(n)/n1

10+5dθ for R = Θ, and

hence also for R as in (S.114). Assumption 3.15(i) is immediate since ‖·‖E = ‖·‖B = ‖·‖2and hence Kb = 1, while Assumption 3.15(ii) is implied by Assumption S.6.5(i) and

‖θn − θ0(P )‖2 = oP (1) uniformly in P ∈ P0. Assumption 3.16(i) is immediate since

Sn(B,E) = 1 and the choices of θn and θun correspond to setting τn = o(n−1/2). Similarly,

it follows form Lemma S.6.2, Sn(L,E) = 1, and n−1/2 = o(`n) that the condition

`2n√

log(`n) = o(ann− 1

2 ) verifies Assumption 3.16(ii). Moreover, since `n = o(rn) and

n−1/2 = o(rn) Assumption 3.16(iii) automatically holds. Hence, Theorem 3.3 implies

En(R|`n) ≥ U?n,P (R|ñ) + oP (an) (S.120)

uniformly in P ∈ P0 for some `n ñ. Similarly, since Ru

n n−1/2, the conditions of

Corollary 3.3 are immediate and hence there are `n ñ and ùn ˜u

n such that

En(R|`n)− En(Θ|ùn) ≥ U?n,P (R|ñ)− U?n,P (Θ|˜un) + oP (an). (S.121)

The theorem therefore follows from (S.120), (S.121) and Lemma S.6.6.

Lemma S.6.1. If Assumptions S.6.1(ii)(iii), S.6.2, S.6.3, and S.6.4(ii) hold, then As-

sumption 3.6 is satisfied with R = Θ, ‖ · ‖E = ‖ · ‖2, ν−1n = η for some η > 0, and

Vn(P ) ≡ θ ∈ Θ : ‖θ − θ0(P )‖2 ≤ ε for some ε > 0.

Proof: First note that in this application Qn,P (θ) ≡ ‖Σ(P )EP [ρ(X, θ)]‖2 for all n, and

hence we write QP (θ) for simplicity. Also observe that Assumptions S.6.1(i), S.6.2(ii),

S.6.4, and S.6.3(i) and Lemma S.6.2 allow us to apply Lemma S.3.1(i) with ‖·‖A = ‖·‖2,

Jn = O(1) and Sn(ε) > 0 to conclude θn ∈ Vn(P ) ≡ θ ∈ Θn : ‖θ − θ0(P )‖2 ≤ ε with

probability tending to one uniformly in P ∈ P0 for any ε > 0. Moreover, Θ being convex

33

and Assumption S.6.2(ii) imply that for some C0 <∞ we have

‖EP [ρ(X, θ)]− EP [ρ(X, θ0(P ))]− EP [∇θρ(X, θ0(P ))](θ − θ0(P ))‖2 ≤ C0‖θ − θ0(P )‖22

for all θ ∈ Θ. Hence, since the smallest singular value of EP [∇θρ(X, θ0(P ))] is bounded

away from zero uniformly in P ∈ P0 by Assumption S.6.3(ii), we obtain for some C1 <∞

‖θ − θ0(P )‖2 ≤ C1‖EP [∇θρ(X, θ0(P ))](θ − θ0(P ))‖2≤ C1‖EP [ρ(X, θ)]− EP [ρ(X, θ0(P ))]‖2 + C0‖θ − θ0(P )‖22 (S.122)

for all θ ∈ Θ and P ∈ P0. Therefore, since EP [ρ(X, θ0(P ))] = 0, Assumption 3.6(i)

holds with ‖ · ‖E = ‖ · ‖2 and ν−1n = η for some η > 0 due to (S.122) and Assumption

S.6.4(ii). Assumption 3.6(iv) is immediate due to QP (θ0(P )) = 0.

Lemma S.6.2. Let F ≡ ρ(·, θ) : for some θ ∈ Θ and 1 ≤ ≤ J . If Assumptions

S.6.1(iii) and S.6.2 hold, then it follows that supP∈PN[ ](ε,F , ‖ · ‖P,2) . 1 ∨ ε−dθ and

supP∈P J[ ](ε,F , ‖ · ‖P,2) . ε(1 +√

log(1 ∨ ε−1)).

Proof: Since Θ is convex by Assumption S.6.1(iii), the mean value theorem and As-

sumption S.6.2(i) imply for any θ1, θ2 ∈ Θ and 1 ≤ ≤ J that

|ρ(x, θ1)− ρ(x, θ2)| ≤ supθ∈Θ‖∇θρ(x, θ)‖o,2‖θ1 − θ2‖2. (S.123)

Setting D(x) ≡ supθ∈Θ ‖∇θρ(x, θ)‖o,2, then note that Theorem 2.7.11 in van der Vaart

and Wellner (1996) and the right hand side of (S.123) not depending on imply

N[ ](ε,F , ‖ · ‖P,2) ≤ J ×N(ε

2‖D‖P,2,Θ, ‖ · ‖2) . 1 ∨ ε−dθ , (S.124)

where we employed that N(ε,Θ, ‖·‖2) . 1∨ε−dθ due to Θ being bounded by Assumption

S.6.1(iii) and supP∈P ‖D‖P,2 <∞ by Assumption S.6.2(ii).

For the second claim of the Lemma we employ the bound in (S.124) to obtain

supP∈P

J[ ](ε,F , ‖ · ‖P,2) .∫ ε

0(1 + log(1 ∨ u−dθ))1/2du

= ε

∫ 1

0(1 + log(1 ∨ (εv)−dθ))1/2dv . ε(1 +

√log(1 ∨ ε−1)),

where the first equality follows from the change of variables v = u/ε and the final

inequality is implied by the inequality 1 ∨ (ab) ≤ (1 ∨ a)(1 ∨ b).

Lemma S.6.3. If Assumptions S.6.1(i)(iii) and S.6.2 hold, then it follows that As-

sumption 3.7 is satisfied with R = Θ and an =√

log(n)/n1

6+5dθ .

34

Proof: Let εn =√

log(n)/n1

6+5dθ and set δn ≡ 1 ∧ (ε2n√n)− 2

2+5dθ , which note satisfies

1 ≥ δn = o(1). Further define Nn ≡ N(δn,Θ, ‖ · ‖2) and set θkNnk=1 to be the center of

the Nn balls covering Θ. For notational simplicity, we also let

rn,P (x) ≡ ((ρ(x, θ1)− EP [ρ(X, θ1)])′, . . . , (ρ(x, θNn)− EP [ρ(X, θNn)])′)′

and note rn,P (x) ∈ RJNn . For any P ∈ P and η > 0 also define Cn,P (η) to equal

Cn,P (η) ≡(JNn)EP [‖rn,P (X)‖32]

η3ε3n√n

. (S.125)

It then follows by Yurinskii’s coupling (see, e.g., Theorem 10.10 in Pollard (2002)) that

there exists a Gaussian vector Nn,P ∈ RJNn and universal constant K0 such that

P (‖Gn,P rn,P − Nn,P ‖2 > 3ηεn) ≤ K0Cn,P (η)(1 +| log(1/Cn,P (η))|

JNn). (S.126)

Next note that Assumption S.6.2(ii) and the convexity of the map u 7→ |u|32 imply that

supP∈P

EP [‖rn,P (X)‖32] ≤ (JNn)32 × sup

P∈P

1

JNn

Nn∑k=1

J∑=1

EP [|ρ(X, θk)|3] . N32n . (S.127)

In particular, since N(ε,Θ, ‖ · ‖2) . 1∨ ε−dθ , it follows from δn ≤ 1 that Nn . δ−dθn , and

hence by (S.127) and the definition of Cn,P (η) in (S.125) we obtain

supP∈P

Cn,P (η) .N

52n

η3ε3n√n.

1

η3ε3n(nδ5dθn )

12

. (S.128)

Moreover, since the function u 7→ u(1 + | log(1/u)|/A) with A ≥ 1 is increasing in u on

the interval (0, 1] and ε3n(nδ5dθn )

12 →∞, we obtain from results (S.126) and (S.128) that

lim supn→∞

supP∈P

P (‖Gn,P rn,P − Nn,P ‖2 > 3ηεn)

. lim supn→∞

1

η3ε3n(nδ5dθn )

12

(1 +| log(η3ε3n(nδ5dθ

n )12 )|

JNn) = 0, (S.129)

where the final result follows by direct calculation. Letting Sn,P denote the linear span

of rn,P in L2P we then employ Nn,P to define a Gaussian process W(1)

n,P on Sn,P by setting

W(1)n,P (

Nn∑k=1

λ′kρ(·, θk)) ≡ (λ′1, . . . , λ′Nn)Nn,P (S.130)

for any λkNnk=1 with λk ∈ RJ . Letting Projf |Sn,P denote the projection of f onto

Sn,P under ‖ · ‖P,2, and assuming the probability space is suitably large to carry an

35

isonormal process W(2)n,P on f −Projf |Sn,P : f ∈ F that is independent of W(1)

n,P , we

then define the isonormal process Wn,P to be given by

Wn,P f ≡W(1)n,P (Projf |Sn,P ) + W(2)

n,P (f − Projf |Sn,P ). (S.131)

To establish Wn,P meets the requirements of Assumption 3.7, next let Πnθ denote

the projection of any θ ∈ Θ onto θkNnk=1 under ‖ · ‖2 and define the class

Gn,P ≡ (ρ(·, θ)−ρ(·,Πnθ))−EP [(ρ(X, θ)−ρ(X,Πnθ))] : θ ∈ Θ, 1 ≤ ≤ J. (S.132)

By the mean value theorem, Θ being convex by Assumption S.6.1(iii), and ‖θ−Πnθ‖2 ≤δn for every θ ∈ Θ due to δn-balls around θkNnk=1 covering Θ, it follows that

supθ∈Θ|(ρ(x, θ)− ρ(x,Πnθ))− EP [(ρ(X, θ)− ρ(X,Πnθ))]|

≤ supθ∈Θ‖∇θρ(x, θ)‖o,2 + sup

P∈PEP [sup

θ∈Θ‖∇θρ(X, θ)‖o,2] × δn.

Hence, setting G(x) ≡ 1 ∨ supθ∈Θ ‖∇θρ(x, θ)‖o,2 + supP∈PEP [supθ∈Θ ‖∇θρ(X, θ)‖o,2]it follows that G(x)δn is an envelope for Gn,P , which by Assumption S.6.2(ii) satisfies

supP∈P ‖Gδn‖P,2 . δn. Further note that if [fl, fu] is a bracket containing a function f ,

then [fl − EP [fu(X)], fu − EP [fl(X)]] contains f − EP [f(X)] and satisfies

‖fu − fl − EP [fl(X)− fu(X)]‖P,2 ≤ 2‖fu − fl‖P,2

by Jensen’s inequality and the triangle inequality. Therefore, Lemma S.6.2 implies

supP∈P

N[ ](ε,Gn,P , ‖ · ‖P,2) . Nn × (1 ∨ ε−dθ),

and hence Theorem 2.14.2 in van der Vaart and Wellner (1996) together with Gn,Phaving envelope δnG with G ≥ 1 and supP∈P ‖G‖P,2 <∞ and Nn . δ−dθn yield

supP∈P

EP [ supg∈Gn,P

|Gn,P g|] . supP∈Pδn‖G‖P,2

∫ 1

0(1 + logN[ ](εδn‖G‖P,2,Gn,P , ‖ · ‖P,2))

12dε

. δn

∫ 1

0(1 + log(Nn) + log(1 ∨ (εδn)−dθ))1/2dε . δn(1 + log(δ−dθn ))1/2. (S.133)

Therefore, the definitions of δn and εn, result (S.133) and Markov’s inequality imply

lim supn→∞

supP∈P

P ( supg∈Gn,P

|Gn,P g| > ηεn) . lim supn→∞

δn(1 + log(δ−dθn ))1/2

ηεn= 0. (S.134)

Similarly, since Wn,P is Gaussian and 0 ∈ Gn,P , Corollary 2.2.8 in van der Vaart and

36

Wellner (1996) and packing numbers being bounded by bracketing numbers imply

supP∈P

EP [ supg∈Gn,P

|Wn,P g|] . supP∈P

∫ ∞0

(logN[ ](ε,Gn,P , ‖ · ‖P,2))12dε

. supP∈P

∫ 2δn‖G‖P,2

0(logN[ ](ε,Gn,P , ‖ · ‖P,2))

12dε . δn(1 + log(δ−dθn ))1/2, (S.135)

where in the second inequality we employed that the bracket [−δnG, δnG] covers Gn,Pdue to δnG being an envelope for Gn,P , and the final inequality follows from the change

of variables u = ε/(2δn‖G‖P,2) and the same manipulations as in (S.133). Hence,

lim supn→∞

supP∈P

P ( supg∈Gn,P

|Wn,P g| > ηεn) . lim supn→∞

δn(1 + log(δ−dθn ))1/2

ηεn= 0, (S.136)

by result (S.135) and Markov’s inequality. To conclude, note that the definitions of

Wn,P in (S.130) and (S.131), and of Gn,P in (S.132), and the triangle inequality yield

supθ∈Θ‖Gn,Pρ(·, θ)−Wn,Pρ(·, θ)‖2

≤ ‖Gn,P rn,P − Nn,P ‖2 + supg∈Gn,P

√J |Gn,P g|+ sup

g∈Gn,P

√J |Wn,P g|.

Thus the Lemma follows from (S.129), (S.134), and (S.136).

Lemma S.6.4. If Assumptions S.6.1(i)(iii) and S.6.2 hold, then it follows that As-

sumption 3.14 is satisfied with R = Θ and an = log3/4(n)/n1

12+2dθ .

Proof: We aim to establish the lemma by relying on Theorem S.9.1(i) in Section S.9.

To this end set ζn = n− 1

2(6+dθ) , Mn = n1

6+dθ , and Nn ≡ N(ζn,Θ, ‖ · ‖2). By Assumption

S.6.2(ii) the function F (x) ≡ supθ∈Θ ‖ρ(x, θ)‖2 is integrable, and for any θ ∈ Θ we let

ρ(x, θ) ≡ (ρ1(x, θ)1F (x) ≤Mn, . . . , ρJ (x, θ)1F (x) ≤Mn)′.

Defining dn = JNn and θkNnk=1 to be the centers of the ζn-balls covering Θ we then let

fdnn (x) ≡ (ρ(x, θ1)′, . . . , ρ(x, θNn)′)′.

Next note that since each entry of the matrix fdnn (X)fdnn (X)′ is almost surely bounded

by M2n it follows that ‖EP [fdnn (X)fdnn (X)′]‖o,2 ≤ dnM2

n, and hence Assumption S.9.1 in

Section S.9 holds with Cn = dnM2n and Kn = Mn. For every θ ∈ Θ let Πnθ denote its

projection (under ‖ · ‖2) onto θkNnk=1 and define the class Gn ≡ ρj(·, θ)− ρj(x,Πnθ) :

37

θ ∈ Θ and 1 ≤ ≤ J. Further observe Gn has envelope Gn given by

supg∈Gn

|g(x)| ≤ max1≤≤J

supθ∈Θ|ρ(x, θ)− ρ(x,Πnθ)|+ |ρ(x,Πnθ)|1F (x) > Mn

≤ supθ∈Θ‖∇θρ(x, θ)‖o,2‖θ −Πnθ‖2 + F (x)1F (x) > Mn ≡ Gn(x),

where in the second inequality we employed the mean value theorem and Θ being convex

by Assumption S.6.1(iii). Therefore, since the ζn-balls centered around θkNnk=1 cover

Θ, we obtain from Assumption S.6.2(ii) and Markov’s and Holder’s inequalities that

supP∈P

EP [G2n(X)]

. supP∈P

EP [supθ∈Θ‖∇θρ(X, θ)‖2o,2]× ζ2

n + supP∈PEP [F 3(X)]

23 P (F (X) > Mn)

13

. ζ2n +M−1

n . (S.137)

Moreover, since Θ being bounded by Assumption S.6.1(iii) implies Nn . ζ−dθn , we obtain

supP∈P

N[ ](ε,Gn, ‖ · ‖P,2) . Nn × (1 ∨ ε)−dθ . ζ−dθn × (1 ∨ ε)−dθ (S.138)

due to Lemma S.6.2. Hence, the change of variables u = ε/‖Gn‖P,2 implies that

supP∈P

∫ ‖Gn‖P,20

(1+ logN[ ](ε,Gn, ‖ · ‖P,2))1/2dε

. supP∈P‖Gn‖P,2

∫ 1

0(1 + log(ζ−dθn ) + log(1 ∨ (u‖Gn‖P,2)−dθ))1/2du

. (ζn +M−1/2n )(1 + log(ζ−1

n +M1/2n ))1/2 (S.139)

where in the inequalities we employed results (S.137) and (S.138). In particular, results

(S.138) and (S.139) together with Lemma S.9.4 imply that Assumption S.9.2(i) in Sec-

tion S.9 is satisfied with J1n . (ζn+M−1/2n )(1 + log(ζ−1

n +Mn))1/2. Similarly, note that

in this application, the set Bn in Assumption S.9.2(ii) consists of 0 ∈ Rdn and the set

of vectors in Rdn with one coordinate equal to one and all other coordinates equal to

zero. Thus, Assumption S.9.2(ii) holds with J2n = (log(1 + dn))1/2 . (log(1 + ζ−1n ))1/2.

We have so far verified Assumptions S.9.1 and S.9.2 in Section S.9 hold with dn .

ζ−dθn , Kn = Mn, Cn .M2nζ−dθn , J1n . (ζn +M

−1/2n )(1 + log(ζ−1

n +M1/2n ))1/2, and J2n .

(log(1 + ζ−1n ))1/2. Since we had set ζn = n

− 12(6+dθ) and Mn = n

16+dθ , the requirement

that dn log(1 + dn)K2n = o(n) imposed by Theorem S.9.1(i) holds as well. Therefore,

Assumption S.6.1(i) and Theorem S.9.1(i) finally enable us to conclude that there exists

38

a process W?n,P that is independent of the data Xini=1 and such that

supθ∈Θ‖Wnρ(·, θ)−W?

n,Pρ(·, θ)‖2

≤ max1≤≤J

supθ∈Θ

√J |Wnρ(·, θ)−W?

n,Pρ(·, θ)| = OP (log3/4(n)n− 1

12+2dθ )

uniformly in P ∈ P, which conclude the proof of the Lemma.

Lemma S.6.5. If Assumption S.6.1(ii), S.6.5(ii)-(vi), and S.6.6 hold, then it follows

that Assumptions 3.11, 3.12, and 3.13 are satisfied.

Proof: Recall that in this setting G = RdG and ‖ · ‖G = ‖ · ‖∞. Then note that if

θ1, θ2 ∈ θ : ‖θ−θ0(P )‖2 ≤ ε for some P ∈ P0, then it follows that αθ1 +(1−α)θ2 ∈ Bε

for all α ∈ [0, 1] and Bε as defined in Assumption S.6.5. Hence, by convexity of Bε and

Proposition 7.3.3 in Luenberger (1969) it follows that

‖ΥG(θ1)−ΥG(θ2)−∇ΥG(θ1)[θ1 − θ2]‖G ≤ supθ∈Bε

max1≤j≤dG

‖∇2ΥG,j(θ)‖o,2‖θ1 − θ2‖22

2.

(S.140)

Employing again that θ1, θ2 ∈ θ : ‖θ−θ0(P )‖2 ≤ ε for some P ∈ P0 implies θ1, θ2 ∈ Bε,

we similarly obtain from Proposition 7.3.2 in Luenberger (1969) that

‖∇ΥG(θ1)−∇ΥG(θ2)‖o = sup‖h‖2=1

max1≤j≤dG

|(∇ΥG,j(θ1)−∇ΥG,j(θ2))[h]|

≤ supθ∈Bε

max1≤j≤dG

‖∇2ΥG,j(θ)‖o,2‖θ1 − θ2‖2. (S.141)

Since ‖∇2ΥG,j(θ)‖o,2 is uniformly bounded on Bε by Assumption S.6.5(v), it follows

from results (S.140) and (S.141) that Assumptions 3.11(i)-(ii) are satisfied with

Kg ≡ supθ∈Bε

max1≤j≤dG

‖∇2ΥG,j(θ)‖o,2.

Assumption S.6.5(iii) additionally implies Mg ≡ supθ∈Bε ‖∇ΥG(θ)‖o,2 < ∞, and hence

verifies Assumption 3.11(iii). By identical arguments, but recalling F = RdF and ‖·‖F =

‖ · ‖2, it follows Assumptions S.6.5(iii)-(iv) imply Assumptions 3.12(i)-(iii) hold with

Kf ≡√dF sup

θ∈Bεmax

1≤j≤dF‖∇2ΥF,j(θ)‖o,2 Mf ≡ sup

θ∈Bε‖∇ΥF (θ)‖o,2. (S.142)

To conclude, note that since Assumption S.6.5(vi) implies the range of ∇ΥF (θ)

equals RdF for all θ ∈ Bε, it follows that ∇ΥF (θ) admits a right inverse. Moreover, if

ΥF is linear, then Kf = 0 and hence Assumption 3.12(iv) holds. On the other hand,

if ΥF is nonlinear, then note ∇ΥF (θ)− = ∇ΥF (θ)′(∇ΥF (θ)∇ΥF (θ)′)−1 and therefore

‖∇ΥF (θ)−‖o,2 is bounded for all θ ∈ Bε due to ‖∇ΥF (θ)‖o,2 being bounded on Bε by

39

Assumption S.6.5(ii), and the smallest singular value of ∇ΥF (θ)′ being bounded away

from zero on Bε by Assumption S.6.6(ii). It thus follows Assumption 3.12(iv) holds

as well (after possibly increasing Mf in (S.142)). Finally, we note Assumption S.6.6

directly implies Assumption 3.13 and hence the Lemma follows.

Lemma S.6.6. Let Assumptions S.6.1, S.6.2, S.6.3, and S.6.4 hold, and set an =√log(n)/n

110+5dθ . For any `n with n−

12 = o(`n), it follows uniformly in P ∈ P0 that

Un(R|+∞) = infh√n∈Vn(θn,`n)

‖Wnρ(·, θn) + Dn(θn)h‖Σn,2 + oP (an)

Un(Θ|+∞) = inf‖ h√

n‖2≤`n


n)h‖Σn,2 + oP (an).

Proof: We establish the lemma by relying on Lemma 3.1. To this end note that in

the proof of Theorem S.6.1, Assumptions 3.2, 3.3(i)(iii), and 3.4 were verified and θn

and θun were shown to be consistent for θ0(P ) with ‖ · ‖B = ‖ · ‖E = ‖ · ‖2, Rn = n−1/2,

and νn 1 for both R as in (S.114) and for R = Θ. Next, note Lemma S.6.4 verifies

Assumption 3.14 holds with an =√

log(n)/n1

10+5dθ for R = Θ and hence also for R as

in (S.114). Also observe that the mean value theorem and Θ being convex imply

| ∂∂θk

ρ(x, θ1)− ∂

∂θkρ(x, θ2)| ≤ max

1≤≤Jsupθ∈Θ‖∇2

θρ(x, θ)‖o,2‖θ1 − θ2‖2 (S.143)

for any θ1, θ2 ∈ Θ, 1 ≤ ≤ J , and 1 ≤ k ≤ dθ. Thus, Assumption S.6.2(ii) implies there

exists a C0 <∞ such that for all P ∈ P and θ1, θ2 ∈ Θ it follows that

‖EP [∇θρ(X, θ1)−∇θρ(X, θ2)]‖o,2 ≤ C0‖θ1 − θ2‖2.

In particular, the function θ 7→ EP [∇θρ(X, θ)] is uniformly continuous in θ and P ∈ P,

which implies by Assumption S.6.3(ii) that there is an ε0 > 0 such that the smallest

singular value of EP [∇θρ(X, θ)] is bounded away from zero on θ ∈ Θ : ‖θ − θ0(P )‖2 ≤ε0 for some P ∈ P0. Since νn 1, p = 2, and ‖·‖E = ‖·‖2, the Lemma 3.1 requirement

‖h‖E ≤ νn‖Dn,P (θ)[h]‖p for all θ ∈ An(P ), P ∈ P0, and h ∈√nBn∩R−θ holds with

An(P ) = (Θr0n(P ))ε0 and R = Θ (and hence also for R as in (S.114)). Moreover, by

uniform consistency of θn and θun it follows that θn, θ

un ∈ An(P ) with probability tending

to one uniformly in P ∈ P0.

To conclude, define F ≡ ∂∂θk

ρ(·, θ) : for some θ ∈ Θ, 1 ≤ ≤ J , 1 ≤ k ≤ dθ and

let F (x) ≡ max1≤≤J supθ∈Θ ‖∇2θρ(x, θ)‖o,2. Then note that if ε-balls around θiNεi=1

cover Θ, then result (S.143) implies that the brackets [ ∂∂θk

ρ(·, θi)− εF, ∂∂θk

ρ(·, θi) + εF ]

cover F . Writing these brackets as [fl,k, fu,k]Kεk=1 for conciseness, further note that

Kε = J dθNε <∞ since Nε <∞ due to Θ being compact, and C1 ≡ supP∈P ‖F‖P,1 <∞

40

by Assumption S.6.2(ii). Moreover, by definition of [fl,k, fu,k] it further follows that

EP [fu,k(X)− fl,k(X)] = ‖fu,k − fl,k‖P,1 ≤ 2εC1 (S.144)

for all P ∈ P. Hence, employing the bound f(x) − EP [f(X)] ≤ fu,k(x) − EP [fl,k(X)]

for [fl,k, fu,k] the bracket containing f , we obtain from result (S.144) that

supf∈F 1

n

n∑i=1

f(Xi)− EP [f(X)]

≤ max1≤k≤Kε

| 1n

n∑i=1

fu,k(Xi)− EP [fu,k(X)]|+ 2εC1 = 2εC1 + oP (1), (S.145)

where the equality holds uniformly in P ∈ P by Assumption S.6.2(ii), Kε < ∞, and

Theorem 2.8.1 in van der Vaart and Wellner (1996). By identical arguments, we have

inff∈F 1

n

n∑i=1

f(Xi)− EP [f(X)]

≥ − max1≤k≤Kε

| 1n

n∑i=1

fl,k(Xi)− EP [fl,k(X)]| − 2εC1 = −2εC1 + oP (1), (S.146)

uniformly in P ∈ P. We thus conclude from results (S.145) and (S.146) that F is

Glivenko-Cantelli uniformly in P ∈ P. Since by Assumption S.6.5(i) there exists an

ε > 0 such that θ : ‖θ − θ0(P )‖2 ≤ ε ⊆ Θ for all P ∈ P0, we obtain

supθ:‖θ−θ0(P )‖2≤ε

suph∈Rdθ :‖ h√

n‖2≥`n

‖Dn(θ)[h]− Dn,P (θ)[h]‖2‖h‖2

≤ supθ∈Θ‖ 1

n

n∑i=1

∇θρ(Xi, θ)− EP [∇θρ(X, θ)]‖o,2 = oP (1) (S.147)

uniformly in P ∈ P, and where the equality follows from F being Glivenko-Cantelli

uniformly in P ∈ P. Since νn 1, result (S.147) verifies condition (24) in Lemma 3.1.

This concludes verifying the requirements of Lemma 3.1 and hence the present Lemma

follows for any `n satisfying Sn(B,E)Rn = o(`n), which in this application is equivalent

to n−1/2 = o(`n) due to ‖ · ‖B = ‖ · ‖E = ‖ · ‖2 and Rn n−1/2.

S.6.2 Consumer Demand

We base our next example on a long-standing literature aiming to replace paramet-

ric assumptions with shape restrictions implied by economic theory (Matzkin, 1994).

41

Specifically, suppose that quantity demanded by individual i, denoted Qi, satisfies

Qi = gP (Si, Yi) +W ′iγP + Ui

where Si ∈ R+ denotes price, Yi ∈ R+ denotes income, and Wi ∈ Rdw is a set of

covariates. In addition, we assume there is an instrument Zi yielding the restriction

EP [Q− gP (S, Y )−W ′γP |Z] = 0. (S.148)

For instance, under exogeneity of prices we may let Z = (S, Y,W ′)′ as in Blundell et al.

(2012). Alternatively, if there is a concern that prices are endogenous, then we may set

Z = (I, Y,W ′)′ for I an instrument for S, as in Blundell et al. (2017).

Our goal is to conduct inference on the level of demand at particular price income

pair (s0, y0) while imposing that the function gP satisfies the Slutsky restriction

∂

∂sgP (s, y) + gP (s, y)

∂

∂ygP (s, y) ≤ 0. (S.149)

To map this problem into our framework, we assume that for some set Ω, (S, Y ) ∈ Ω ⊆R2

+ with probability one for all P ∈ P and impose that gP ∈ C1B(Ω), where

CmB (Ω) ≡ g : Ω→ R s.t. ‖g‖m,∞ <∞ ‖g‖m,∞ ≡ sup0≤α≤m

sup(s,y)∈Ω

|∇αg(s, y)|.

Since the parameters in this model are (gP , γP ) with γP ∈ Rdw , we set B = C1B(Ω)×Rdw

and equip it with the norm ‖θ‖B = max‖g‖1,∞, ‖γ‖2 for any (g, γ) = θ ∈ B. Note

that in this application, X = (Q,S, Y,W ) ∈ X ≡ R+ × Ω×Rdw and

ρ(X, θ) = Q− g(S, Y )−W ′γ. (S.150)

We further assume that θ0(P ) = (gP , γP ) is identified as the unique solution to (S.148).

In order to impose the Slutsky restriction in (S.149) we let G = C0B(Ω) and ‖ · ‖G =

‖ · ‖∞, where with some abuse of notation we write ‖ · ‖∞ in place of ‖ · ‖0,∞. The space

C0B(Ω) is a Banach lattice under the standard pointwise ordering given by

a ≤ b if and only if a(s, y) ≤ b(s, y) for all (s, y) ∈ Ω. (S.151)

Moreover, the constant function c ∈ C0B(Ω) satisfying c(s, y) = 1 for all (s, y) ∈ Ω is an

order unit under the partial ordering in (S.151). Its induced norm on C0B(Ω) equals

inf λ > 0 : |a| ≤ λc = sup(s,y)∈Ω

|a(s, y)|,

which coincides with the norm ‖·‖∞ on C0B(Ω), and we therefore set 1G = c. To encode

42

the Slutsky restriction in (S.149) we then let the map ΥG : B→ G equal

ΥG(θ)(s, y) =∂

∂sg(s, y) + g(s, y)

∂

∂yg(s, y) (S.152)

for any θ = (g, γ) ∈ B. Finally, to test whether the level of demand at a prescribed

price s0 and income y0 equals a hypothesized value c0, we set F = R, ‖ · ‖F = | · |, and

ΥF (θ) = g(s0, y0)− c0 (S.153)

for any θ = (g, γ) ∈ B. By setting R = θ ∈ B : ΥG(θ) ≤ 0 and ΥF (θ) = 0 and

conducting test inversion (over different values of c0) of the null hypothesis

H0 : θ0(P ) ∈ R H1 : θ0(P ) /∈ R

we may then obtain a confidence region for the level of demand at price s0 and income

y0 that imposes the Slutsky restriction in (S.149).

We set the parameter space to be a ball in B under ‖ · ‖B by setting Θ to be

Θ ≡ (g, γ) ∈ C1B(Ω)×Rdw : ‖θ‖1,∞ ≤ C0 and ‖γ‖2 ≤ C0 (S.154)

for some C0 < ∞. Given a sequence of approximating functions pj,njnj=1, we then let

pjnn (s, y) ≡ (p1,n(s, y), . . . , pjn,n(s, y))′ and set the sieve Θn to equal

Θn ≡ (pjn′n β, γ) : ‖pjn′n β‖1,∞ ≤ C0 and ‖γ‖2 ≤ C0.

Similarly, for a sequence qn,kknk=1 of transformations of the conditioning variable Z, we

let qknn (z) ≡ (q1,n(z), . . . , qkn,n(z))′. We base our test statistic on the quadratic forms

Qn(θ) ≡ ( 1

n

n∑i=1

(Qi−g(Si, Yi)−W ′iγ)qknn (Zi))′Σn(

1

n

n∑i=1

(Qi−g(Si, Yi)−W ′iγ)qknn (Zi))12

for some weighting matrix Σn and every (g, γ) = θ ∈ Θ. The constrained (i.e. In(R))

and unconstrained (i.e. In(Θ)) statistics are then simply given by the minimum of√nQn

over θ ∈ Θn ∩R and Θn respectively.

The next assumptions suffice for obtaining a strong approximation. In their state-

ment, the notation singA denotes the smallest singular value of a matrix A.

Assumption S.6.7. (i) Xi, Zini=1 is i.i.d. with (X,Z) distributed according to P ∈ P;

(ii) For Θ as in (S.154) and each P ∈ P0 there exists a unique θ0(P ) ∈ Θ satisfying

(S.148); (iii) The support of (Q,W ) is bounded uniformly in P ∈ P.

Assumption S.6.8. (i) sup(s,y) ‖pjnn (s, y)‖2 .

√jn; (ii) sup(s,y) ‖∂ap

jnn (s, y)‖2 . j

3/2n

for a ∈ s, y; (iii) The eigenvalues of EP [pjnn (S, Y )pjnn (S, Y )′] are bounded away from

43

zero and infinity uniformly in P ∈ P and n; (iv) For each P ∈ P0 there is a Πnθ0(P ) =

(gn, γ) ∈ Θn ∩R with supP∈P0‖EP [(gP (S, Y )− gn(S, Y ))qknn (Z)‖2 = O((n log(n))−1/2).

Assumption S.6.9. (i) max1≤k≤kn ‖qk,n‖∞ .√kn; (ii) The largest eigenvalues of

EP [qknn (Z)qknn (Z)′] are bounded uniformly in P ∈ P and n; (iii) For every n, sn ≡infP∈P singEP [qknn (Z)(pjnn (S, Y )′ W ′)] is positive and sn = O(1); (iv) j3

nk3n log3(n) =

o(n) and k2njn log

32 (1 + kn)/(sn

√n)(1 ∨

√log(sn

√n/kn)) = o((log(n))−1/2).

Assumption S.6.10. (i) ‖Σn−Σn(P )‖o,2 = OP ((kn√jn log(n))−1) uniformly in P ∈ P;

(ii) Σn(P ) is invertible and ‖Σn(P )‖o,2 and ‖Σn(P )−1‖o,2 are bounded in P ∈ P and n.

Assumption S.6.7(iii) requires (Q,W ) to be bounded, which enables to apply the

recent coupling results by Zhai (2018). Alternatively, Assumption S.6.7(iii) can be re-

laxed under appropriate tail conditions; see, e.g., the proof of Lemma S.6.4 for related

arguments. Assumptions S.6.8(i)-(iii) are standard requirements on Θn that are sat-

isfied, e.g., by tensor product wavelets or B-splines under appropriate conditions on

the distribution of (S, Y ) (Newey, 1997; Chen, 2007; Belloni et al., 2015; Chen and

Christensen, 2018). Assumption S.6.8(iv) pertains the approximating requirements on

the sieve; see Remarks S.6.1 and S.6.2 below. Assumption S.6.8(iv) can be relaxed for

studying the re-centered statistic, but we impose it here for simplicity to analyse both

In(R) and In(R) − In(Θ). In turn, Assumption S.6.9(i)(ii) imposes standard require-

ments on the transformations qk,nknk=1. Assumption S.6.9(iii)(iv) contains the required

rate conditions, which are governed by sn – a parameter that is proportional to ν−1n (as

in Assumption 3.6) and is closely linked the degree of ill-posedness; see Remark S.6.2

below. Finally Assumption S.6.10 states the conditions on the weighting matrix Σn.

In this application, we may set ‖θ‖E = supP∈P ‖g‖P,2 + ‖γ‖2 for any (g, γ) ∈ Θ.

Since in addition any θ = (g, γ) ∈ Θn ∩R has the structure g = pjn′n β, we have

Vn(θ, `) =

(pjn′n βh, γh) : ‖g +pjn′n βh√

n‖1,∞ ≤ C0 and ‖γ +

γh√n‖2 ≤ C0 (S.155)

pjnn (s0, y0)′βh = 0 (S.156)

∂

∂s(g +

pjn′n βh√n

) + (g +pjn′n βh√

n)∂

∂y(g +

pjn′n βh√n

) ≤ 0 (S.157)

supP∈P‖pjn′n βh‖P,2 + ‖γh‖2 ≤ `

√n, (S.158)

where constraint (S.155) corresponds to (θ + h/√n) ∈ Θn, constraints (S.156) and

(S.157) impose θ+ h/√n ∈ R, and constraint (S.158) imposes ‖h/

√n‖E ≤ `. Similarly,

V un (θ, `) =

(pjn′n βh, γh) : ‖g +

pjn′n βh√n‖1,∞ ≤ C0 and ‖γ +

γh√n‖2 ≤ C0 (S.159)

supP∈P‖pjn′n βh‖P,2 + ‖γh‖2 ≤ `

√n. (S.160)

44

Finally, we also note that in this application Dn,P (θ) does not depend on θ due to the

moments being linear. Therefore, we suppress θ from the notation and note

Dn,P [h] = −EP [qknn (Z)(pjnn (S, Y )′βh +W ′γh)]

for any h = (pjn′n βh, γh) ∈ Bn. Given these definitions, recall for any `n we have

Un,P (R|`n) ≡ infh√n∈Vn(Πnθ0(P ),`n)

‖Wn,Pρ(·,Πnθ0(P ))qknn + Dn,P [h]‖Σn(P ),2

Un,P (Θ|`n) ≡ infh√n∈V u

n (Πnθ0(P ),`n)‖Wn,Pρ(·,Πnθ0(P )) + Dn,P [h]‖Σn(P ),2.

Theorem 3.2(ii) immediately yields the following distributional approximations.

Theorem S.6.3. Let Assumptions S.6.7, S.6.8, S.6.9, S.6.10 hold, and an = (log(n))−1/2.

Then: for any `n, ùn ↓ 0 satisfying kn

√jn log(1 + kn)(`n ∨ ùn

√log(√jn/(`n ∨ ùn)) =

o(an) and kn√jn log(1 + kn)/sn

√n = o(`n ∧ ùn) it follows uniformly in P ∈ P0 that



Theorem S.6.3 yields a family of strong approximation to both In(R) and the re-

centered statistic In(R)− In(Θ). Thus, our next goal is to obtain a bootstrap estimate

of the distribution of any of these strong approximations. To this end, we let

θn ∈ arg minθ∈Θn∩R

Qn(θ) θun ∈ arg min

θ∈ΘnQn(θ)

and, for ρ(·, θ) as in (S.150), we approximate the law of Wn,Pρ(·,Πnθ0(P )) by evaluating

Wnρ(·, θn)qknn ≡1√n

n∑i=1

ωiqknn (Zi)ρ(Xi, θn)− 1

n

n∑j=1

qknn (Zj)ρ(Xj , θn),

at θ = θn and θn = θun, and where ωini=1 is an i.i.d. sample of standard normal random

variables that are independent of the data. Furthermore, since ρ(X, θ) is linear in θ, we

do not need to employ a numerical derivative estimator. Instead, we simply set

Dn[h] = − 1

n

n∑i=1

qknn (Zi)(W′iγh + pjnn (Si, Yi)

′βh)

for any h = (pjn′n βh, γh) ∈ Bn as our estimator for Dn,P [h].

With regards to the local parameter space, we note that in this application Assump-

45

tions 3.11(i)(ii) are satisfied with Kg = 2 (see Lemma S.6.11). Therefore, we have

Gn(θn) = h√

n∈ Bn :

∂

∂spjnn (s, y)′(βn+

βh√n

)+pjnn (s, y)′(βn+βh√n

)∂

∂ypjnn (s, y)′(βn+

βh√n

)

≤ max ∂∂spjnn (s, y)′βn + pjnn (s, y)′βn

∂

∂ypjnn (s, y)′βn − 2rn‖

pjn′n βh√n‖1,∞,−rn

. (S.161)

Moreover, in this problem it is possible to set `n to be infinite due to ρ(X, ·) and ΥF

being linear. Hence, as a sample analogue to the local parameter space we employ

Vn(θn,+∞) = h√n∈ Bn :

h√n∈ Gn(θn) and pjnn (s0, y0)′βh = 0.

Given the introduced notation, we define the statistics Un(R|+∞) and Un(Θ|+∞) by

Un(R|+∞) ≡ infh√n∈Vn(θn,+∞)

‖Wnρ(·, θn) + Dn[h]‖Σn,2

Un(Θ|+∞) ≡ infh√n

‖Wnρ(·, θun) + Dn[h]‖Σn,2.

In order to establish the validity of the bootstrap, we impose one final assumption.

Assumption S.6.11. (i) There is an ε > 0 such that ‖gP ‖1,∞ ∨ ‖γP ‖2 ≤ C0 − ε

for all P ∈ P0; (ii) Πnθ0(P ) = (gn, γ) ∈ Θn ∩ R satisfies ‖gn − gP ‖1,∞ = o(1)

uniformly in P ∈ P0; (iii) The sequence rn ↓ 0 satisfies knj2n

√log(1 + kn)/sn

√n =

o(rn/√

log(n)); (iv) knj3/4n (En ∨

√log(kn)) log1/4(1 + kn) = o(n1/4/

√log(n)), where

En ≡∫∞

0

√log(ε, Cn, ‖ · ‖2)dε and Cn ≡ β : ‖pjn′n β‖1,∞ ≤ C0.

Assumptions S.6.11(i)(ii) suffice for verifying that Assumption 3.15(ii) is satisfied.

These requirements may dropped at the expense of modifying the estimators of the local

parameter space to reflect the possible impact of Πnθ0(P ) being “near” the boundary of

Θn. In turn, Assumption S.6.11 imposes the rate conditions on rn. Finally, Assumption

S.6.11(iv) controls the “size” of the set of coefficients β corresponding to elements pjn′n β ∈Θn and suffices for verifying the bootstrap coupling requirement of Assumption 3.14. For

instance, for tensor product B-splines, En j1/4n (see Lemma S.6.14), which implies a

sufficient condition for Assumption S.6.11(iv) is that k4nj

4n log4(kn) = o(n/ log2(n)). The

rate requirements for a bootstrap coupling can be weakened, for instance, if the test

statistic is based on the ‖ · ‖∞ of the moments (see Lemma S.6.10) or under additional

smoothness assumptions (see Theorem S.9.1(ii)).

Our next result characterizes the properties of the proposed bootstrap statistics.

Theorem S.6.4. Let Assumptions S.6.7, S.6.8, S.6.9, S.6.10, S.6.11 hold, and an =

(log(n))−1/2. Then: for any `n, ˜un ↓ 0 satisfying knj

2n log(1 + kn)/sn

√n = o(`n ∧ ùn),

46

`n = o(rn), and kn√jn log(1 + kn)(`n ∨ ùn)

√log(√jn/(`n ∨ ùn)) = o(an) it follows that

uniformly in P ∈ P0 we have

Un(R|+∞) ≥ U?n,P (R|`n) + oP (an)


We note that the existence of sequence `n, ùn satisfying the requirements of the

theorem is implied by Assumptions S.6.9(iv) and S.6.11(iii). Importantly, any sequences

`n, ùn satisfying the requirements of Theorem S.6.4 also satisfies the requirements of

Theorem S.6.3. As a result, together Theorems S.6.3 and S.6.4 justify employing

q1−α(Un(R|+∞)) ≡ infc : P (Un(R|+∞) ≤ c|Vini=1) ≥ 1− α

as a critical value for In(R). Similarly, for the statistic In(R)− In(Θ) we may employ

q1−α(Un(R|+∞)−Un(Θ|+∞)) ≡ infc : P (Un(R|+∞)−Un(Θ|+∞) ≤ c|Vini=1) ≥ 1−α.

Remark S.6.1. Suppose for notational simplicity that there are no covariates W and

let the marginal distribution of (S, Y, Z) be constant in P ∈ P. If Z = (S, Y ) (i.e. (S, Y )

is exogenous), we may set qknn (Z) = pknn (S, Y )′ for some kn ≥ jn. The singular value

sn can then be assumed to be bounded away from zero, and a sufficient condition for

Assumption S.6.9(iv) is that k3nj

3n log3(n) = o(n). In order to appreciate the content of

Assumption S.6.8(iv), suppose pj∞j=1 is an orthonormal basis such that

gP =

∞∑j=1

βjpj with |βj | = O(j−γβ ).

Setting ΠungP =

∑jnj=1 pjβj , we obtain from a standard integral bound for a sum that

‖EP [(gP (S, Y )−ΠungP (S, Y ))qknn (Z)]‖22 .

kn∑j=jn+1

1

j2γβ.

1

j2γβ−1n

− 1

k2γβ−1n

. (S.162)

For instance, if kn − jn = O(1), then (S.162) yields a bound of order 1/jγβn . Hence,

provided the approximation error by ΠungP and gn (as in Assumption S.6.8(iv)) are

of the same order when gP ∈ R, we obtain that Assumption S.6.8(iv) is equivalent

to√n log(n)/j

γβn = o(1) when kn − jn = O(1). This approximation requirement is

compatible with the condition k3nj

3n log3(n) = o(n) provided γβ > 3.

Remark S.6.2. Building on Remark S.6.1, suppose again there are no covariates W

and the marginal distribution of (S, Y, Z) is constant in P ∈ P, but now let (S, Y ) be

endogenous. A standard benchmark for nonparametric models with endogeneity is to

assume the operator g 7→ EP [g(X)|Z] is compact, in which case there are orthonormal

47

sequences of functions φj∞j=1 of (S, Y ) and ψj∞j=1 of Z satisfying

EP [φj(S, Y )|Z] = λjψj(Z) EP [ψj(Z)|S, Y ] = λjφj(S, Y )

where λj > 0 tends to zero. In addition suppose gP admits for an expansion satisfying

gP =∞∑j=1

βjφj with |βj | = O(j−γβ ),

and let pjn = (φ1, . . . , φjn)′, qkn = (ψ1, . . . , ψkn)′ with kn ≥ jn and kn − jn = O(1),

and set ΠungP =

∑jnj=1 φjβj . Provided the approximation error of Πu

ngP and gn (as in

Assumption S.6.8(iv)) are of the same order when gP ∈ R, we then obtain

‖EP [(gP (S, Y )− gn(S, Y ))qknn (Z)]‖2 .λjnjγβn

.

Moreover, direct calculation shows sn, which is proportional to ν−1n as in Assumption 3.6,

satisfies sn = λjn and hence equals the reciprocal of the sieve measure of ill-posedness

(Blundell et al., 2007). It follows that if λj j−γλ , and γβ > 3, then Assumptions

S.6.8(iv) and S.6.9(iv) can be satisfied by setting jn nκ with (γλ + γβ)−1 < 2κ <

(3 + γλ)−1 and kn − jn = O(1). Alternatively, if λj = exp−γλj, then Assumption

S.6.8(iv) and S.6.9(iv) can be satisfied when γβ > 4 by setting, for example, jn =

(log(n)− κ log(log(n)))/2γλ with 7 < κ < 2γβ − 1 and kn − jn = O(1).


Proof of Theorem S.6.3: We establish the theorem by simply verifying the condi-

tions of Theorem 3.2(ii) for both R as corresponding to (S.152) and (S.153) (to couple

In(R)) and to R = Θ (to couple In(Θ)). To this end, note Assumption 3.2 is imposed in

Assumption S.6.7(i), Assumption 3.3(i) holds with Bn √kn by Assumption S.6.9(i),

and Assumption 3.3(ii) is satisfied by Assumption S.6.9(ii). Further note that for R = Θ,

the class Fn has bounded envelope Fn by Assumption S.6.7(iii) and ‖g‖∞ ≤ C0 for any

(g, γ) ∈ Θ. Hence, by Lemma S.6.8 it follows that Assumption 3.3(iii) holds with

Jn √jn log(1 + jn) when R = Θ, and therefore also for R as corresponding to (S.152)

and (S.153). Assumption 3.4 is implied by Assumption S.6.10.

Lemma S.6.7 additionally verifies that Assumptions 3.5 and 3.6 hold with ‖ · ‖E ≡supP∈P ‖·‖P,2+‖·‖2, Vn(P ) = Θn∩R, and ν−1

n sn for both R = Θ and R as correspond-

ing to (S.152) and (S.153). Similarly, we note Lemma S.6.9 and j3nk

3n log3(n)/n = o(1)

by Assumption S.6.9(iv) imply Assumption 3.7 for both specifications of R under con-

sideration and any an satisfying an = O((log(n))−1). To verify Assumption 3.8, we

48

observe that for any (g1, γ1) ∈ Θn and (g2, γ2) ∈ Θn, Assumption S.6.7(iii) implies

EP [((Q− g1(S, Y )−W ′γ1)− (Q− g2(S, Y )−W ′γ2))2]

. supP∈P‖g1 − g2‖2P,2 + ‖γ1 − γ2‖22. (S.163)

Hence, since ‖ · ‖E ≡ supP∈P ‖ · ‖P,2 + ‖ · ‖2 it follows Assumption 3.8 holds with κρ = 1

and some Kρ <∞. Similarly, note in this application we have

∇mP (θ)[h] = −EP [gh(S, Y ) +W ′γh|Z]

for any θ ∈ B and (gh, γh) = h ∈ B. By direct calculation it then follows Assumptions

3.9(i)(ii) hold with Km = 0 for ‖ · ‖L = ‖ · ‖E, and Assumption 3.9(iii) is satisfied

for some Mm < ∞ by result (S.163) and Jensen’s inequality. Finally, note that since

as argued νn 1/sn, p = 2, Jn √jn log(1 + jn), and Bn

√kn, it follows Rn

kn√jn log(1 + kn)/sn

√n due to jn ≤ kn by Assumption S.6.9(iii). Therefore, since

κρ = 1, Lemma S.6.8 implies Assumption 3.10(i) demands kn√jn log(1 + kn)Rn(1 +√

log(1 ∨ (√jn/Rn))) = o(an), which is satisfied with an = 1/

√log(n) by Assumption

S.6.9(iv). Hence, since Assumptions S.6.8(iv) and S.6.10 imply Assumption 3.10(ii)

and 3.10(iii) respectively, it follows that Assumption 3.10 holds as well. To conclude,

note that since Km = 0, the condition KmR2nSn(L,E) = o(ann

−1/2) is automatically

satisfied, the requirement Rn = o(`n) is equivalent to kn√jn log(1 + kn)/sn

√n = o(`n),

and the condition k1/pn


κρn ,Fn, ‖ · ‖P,2) = o(an) is implied

by kn√jn log(1 + kn)`n

√log(√jn/`n) = o((log(n))−1/2). Thus, all the conditions of

Theorem 3.2(ii) hold for both R = Θ and R corresponding to (S.152) and (S.153), and

thus the claim of the theorem follows.

Proof of Theorem S.6.4: We first define En(R|`n) and En(Θ|ùn) to be given by

En(R|`n) ≡ infh√n∈Vn(θn,`n)

‖Wnρ(·, θn) + Dn[h]‖Σn,2

En(Θ|ùn) ≡ inf‖ h√

n‖2≤ùn

‖Wnρ(·, θun) + Dn[h]‖Σn,2,

and note that for any sequence `n, ùn satisfying the conditions of the theorem, Lemma

S.6.12 implies Un(R|+∞) = En(R|`n) + oP (an) and Un(Θ|+∞) = En(Θ|ùn) + oP (an)

uniformly in P ∈ P0. To establish the theorem, it therefore suffices to show

En(R|`n) ≥ U?n,P (R|ñ) + oP (an)

En(R|`n)− En(Θ|ùn) ≥ U?n,P (R|ñ)− U?n,P (Θ|˜un) + oP (an)

with `n ñ and ùn ˜u

n uniformly in P ∈ P0. To this end we rely on Theorem

3.3 (for En(R|`n)) and Corollary 3.3 (for En(R|`n) − En(Θ|ùn)). We note that in the

49

proof of Theorem S.6.3 we established that Assumptions S.6.7, S.6.8, S.6.9, and S.6.10

imply Assumptions 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, and 3.10 hold with Rn kn√jn log(1 + kn)/sn

√n, Bn

√kn, νn 1/sn, ‖θ‖B = ‖g‖1,∞ ∨ ‖γ‖2 for any θ =

(g, γ) ∈ Bn, ‖θ‖L = ‖θ‖E = supP∈P ‖g‖P,2 + ‖γ‖2 for any θ = (g, γ) ∈ Bn, κρ = 1,

and an = 1/√

log(n) for R = Θ and R as corresponding to (S.152) and (S.153). We

thus only verify Assumptions 3.11, 3.12, 3.13, 3.14, 3.15, and 3.16 for R = Θ and R as

corresponding to (S.152) and (S.153).

Next note that Lemma S.6.11 implies Assumptions 3.11, 3.12, and 3.13 are satisfied

with Kg = 2 and Kf = 0, while Lemma S.6.10 and Assumption S.6.11(iv) imply As-

sumption 3.14 holds with R = Θ (and hence for R corresponding to (S.152) and (S.153))

with an = (log(n))−12 . Further note that since supP∈P ‖g‖P,2 + ‖γ‖2 ≤ 2(‖g‖1,∞ ∨‖γ‖2)

for any (g, γ) ∈ C1B(Ω) × Rdw , it follows that Assumption 3.15(i) holds with Kb = 2.

To verify Assumptions 3.15(ii) and 3.16, note that by the definitions of ‖ · ‖B and ‖ · ‖Ein this application and the eigenvalues of EP [pjnn (S, Y )pjnn (S, Y )′] being bounded away

from zero uniformly in P ∈ P by Assumption S.6.8(iii) we obtain

Sn(B,E) = sup(β,γ)

‖pjn′n β‖1,∞ ∨ ‖γ‖2supP∈P ‖p

jn′n β‖P,2 + ‖γ‖2

≤ 1 ∨ supβ

‖pjn′n β‖1,∞supP∈P ‖p

jn′n β‖P,2

. 1 ∨ supβ

‖pjn′n β‖1,∞‖β‖2

. j3/2n (S.164)

where the final equality follows from Assumptions S.6.8(i)(ii). Thus, since setting θn

and θun to be the minimizers of Qn corresponds to setting τn = 0, Assumption 3.16(i)

holds provided RnSn(B,E) = o(1), which is satisfied by Assumption S.6.9(iv) and

jn ≤ kn due to Assumption S.6.9(iii). Furthermore, also note that RnSn(B,E) = o(1)

together with Assumption S.6.11(i)(ii) and setting Πunθ0(P ) = Πr

nθ0(P ) = Πnθ0(P )

for all P ∈ P0 imply Assumption 3.15(ii) is satisfied as well. Moreover, since Bn √kn and Kf = Km = 0, Lemma S.6.8 implies Assumption 3.16(ii) holds for any `n

satisfying kn√jn log(1 + kn)`n(1 +

√log(√jn/`n)) = o(an). Similarly, we obtain that

Assumption 3.16(iii) is satisfied provided `n = o(rn) (imposed in the Theorem) and

knj2n log(1 + kn)/sn

√n = o(rn) (implied by Assumption S.6.11(iii)). We have thus

verified the conditions of Theorem 3.3 and Corollary 3.3, which together establish the

claim of the Theorem.

Lemma S.6.7. If Assumptions S.6.7(ii)(iii), S.6.8, S.6.9(i)(iii)(iv), and S.6.10 hold,

then Assumptions 3.5 and 3.6 are satisfied with both R = Θ and R corresponding to

(S.152) and (S.153), ‖ · ‖E = supP∈P ‖ · ‖P,2 + ‖ · ‖2, Vn(P ) = Θn ∩R, and ν−1n sn.

Proof: Note that by Assumption S.6.7(ii) there is a unique θ0(P ) ≡ (gP , γP ) ∈ Θ∩R for

which (S.148) holds, and we let Πunθ0(P ) = Πr

nθ0(P ) = (gn, γP ) ∈ Θn∩R for gn = pjn′n βn

50

as in Assumption S.6.8(iv). Further note that in this application Qn,P (θ) equals

Qn,P (θ) = ‖EP [(Q− g(S, Y )−W ′γ)qknn (Z)]‖Σn(P ),2

for any (g, γ) = θ. Therefore, the definition of Πrnθ0(P ), θ0(P ) satisfying (S.148) and

‖Σn(P )‖o,2 being bounded uniformly in n and P ∈ P by Assumption S.6.10(ii) imply

Qn,P (Πrnθ0(P )) = ‖EP [(gP (S, Y )− gn(S, Y ))qknn (Z)]‖Σn(P ),2 = O(

1√n log(n)

) (S.165)

where the final result holds by Assumption S.6.8(iv). Also note that since Fn is uniformly

bounded by definition of Θ and (Q,W ) having compact support by Assumption S.6.7(iii),

Lemma S.6.8 implies Jn .√jn log(1 + jn). Hence, since we may set Bn

√kn by

Assumption S.6.9(i) and kn ≥ jn by Assumption S.6.9(iii), ηn (in the statement of

Assumption 3.6) is bounded by kn√jn log(1 + kn)/

√n. Thus, since Qn,P (θ) ≥ 0 for all

θ ∈ B, it follows from result (S.165) that Assumption 3.5 is satisfied.

In order to verify Assumption 3.6(i) we set ‖θ‖E ≡ supP∈P ‖g‖P,2 + ‖γ‖2 for any

(g, γ) = θ ∈ B. Since the eigenvalues of EP [pjnn (S, Y )pjnn (S, Y )′] are bounded uniformly

in n and P ∈ P by Assumption S.6.8(iii) it follows for any (pjn′n β, γ) ∈ Bn that

sn‖θ −Πrnθ0(P )‖E . sn‖β − βn‖2 + ‖γ − γP ‖2

. ‖EP [(pjnn (S, Y )′(β − βn) +W ′(γ − γP ))qknn (Z)]‖Σn(P ),2

≤ Qn,P (θ)−Qn,P (Πrnθ0(P )) +O(

1√n log(n)

) (S.166)

where the second inequality holds due to ‖Σn(P )−1‖o,2 being bounded uniformly in n

and P ∈ P by Assumption S.6.10(ii), while the final inequality follows from (S.165) and

the triangle inequality. Thus, we conclude form (S.166) that Assumption 3.6(i) holds

with ν−1n sn and Vn(P ) = Θn ∩ R. Finally, note Assumption 3.6(ii) holds by result

(S.165) and Assumption S.6.10(i) and hence the lemma follows.

Lemma S.6.8. Define the class Fn ≡ f : f(v) = (q − g(s, y)− w′γ) for some (g, γ) ∈Θn and suppose that Assumptions S.6.7(iii) and S.6.8(i)(iii) hold. Then, it follows

that supP∈PN[ ](ε,Fn, ‖ · ‖P,2) . (1∨ (√jnK/ε)

jn+dw) for some K <∞, and in addition

supP∈P J[ ](ε,Fn, ‖ · ‖P,2) . ε√jn(1 +

√log(1 ∨ (

√jn/ε))).

Proof: Define the classes F1n ≡ f : f(v) = q − w′γ with ‖γ‖2 ≤ C0 and F2n ≡pjn′n β : ‖pjn′n β‖1,∞ ≤ C0, and then note that by definition of Fn we have

supP∈P

N[ ](ε,Fn, ‖ · ‖P,2) ≤ supP∈P

N[ ](ε

2,F1n, ‖ · ‖P,2)× sup

P∈PN[ ](

ε

2,F2n, ‖ · ‖P,2). (S.167)

Next observe that since the support of W is bounded uniformly in P ∈ P by Assumption

S.6.7(iii), the Cauchy-Schwarz inequality, the covering numbers of γ ∈ Rdw : ‖γ‖2 ≤

51

C0 being bounded (up to a multiplicative constant) by 1 ∨ ε−dw , and Theorem 2.7.11

in van der Vaart and Wellner (1996) imply that

supP∈P

N[ ](ε

2,F1n, ‖ · ‖P,2) . 1 ∨ ε−dw . (S.168)

Similarly, for any elements pjn′n β1, pjn′n β2 ∈ F2n, it follows from the Cauchy-Schwarz that

|pjnn (s, y)′β1 − pjnn (s, y)′β2| ≤ sup(s,y)‖pjnn (s, y)‖2‖β1 − β2‖2 .

√jn‖β1 − β2‖2,

where in the final inequality we employed Assumption S.6.8(i). Hence, Theorem 2.7.11 in

van der Vaart and Wellner (1996), ‖β‖2 ‖pjn′n β‖P,2 uniformly in P ∈ P by Assumption

S.6.8(ii), and ‖pjn′n β‖P,2 ≤ ‖pjn′n β‖∞ ≤ C0 for any pjn′n β ∈ Θn imply that

supP∈P

N[ ](ε

2,F2n, ‖ · ‖P,2) . 1 ∨ (

K√jnε

)jn (S.169)

for some K < ∞. Thus, the first claim of the lemma follows from results (S.167),

(S.168), and (S.169). For the second claim of the lemma, we employ the first claim of

the lemma and the change of variables v = u/ε to obtain the bound

supP∈P

J[ ](ε,Fn, ‖ · ‖P,2) . ε+

∫ ε

0(log(1 ∨ (

K√jn

u)jn+dw))1/2du

= ε(1 +√jn + dw

∫ 1

0(log(1 ∨ (

K√jn

vε)))1/2dv) .

√jnε(1 +

√log(1 ∨ (

√jn/ε))),

where we used that (1 ∨ ab) ≤ (1 ∨ a)(1 ∨ b) whenever a and b are positive.

Lemma S.6.9. If k3nj

3n log2(n) = o(n), Assumptions S.6.7(i)(iii), S.6.9(i), S.6.8(i)(iii)

hold, then Assumption 3.7 holds with R = Θ for any an with k3nj

3n log2(n)/n = o(an).

Proof: We establish the claim of the lemma by relying on Lemma S.6.28. To this

end, let jn = (1 + dw) + jn, set rjnn (x) ≡ (q, w′, pjn′n (x))′, and observe any f ∈ Fn can be

written as f = rjn′n δ for some δ ∈ Rjn . Moreover, by Assumption S.6.8(iii) and definition

of Θn, it follows that there exists an M < ∞ such that Fn ⊆ rjn′n δ : ‖δ‖2 ≤ M,while Assumptions S.6.7(iii), S.6.8(i), and S.6.9(i) imply max1≤j≤jn ‖rj‖∞ .

√jn and

max1≤k≤kn ‖qk‖∞ .√kn. The claim of the lemma therefore follows from applying

Lemma S.6.28 with b1n √kn, b2n

√jn, and Cn = M .

Lemma S.6.10. Suppose Assumptions S.6.7(i)(iii), S.6.9(i)(ii), and S.6.8(i)(iii) hold

and let Cn ≡ β ∈ Rjn : ‖pjn′n β‖1,∞ ≤ C0 and En ≡∫∞

0

√log(N(ε, Cn, ‖ · ‖2))dε. If

j2nk

2n log(1 + knjn) = o(n), then it follows that Assumption 3.14 holds with R = Θ for

any sequence an satisfying k1/pn (

√log(kn) + En)j

3/4n k

1/2n log1/4(1 + jnkn)/n1/4 = o(an).

Proof: Recall that in this application X ≡ (Q,S, Y,W ′)′ and, when R = Θ, we

52

have Fn ≡ f : f(x) = (q − g(s, y) − w′γ) for some (g, γ) ∈ Θn. Therefore, letting

Fn ≡ fqk,n : f ∈ Fn and 1 ≤ k ≤ kn we obtain by direct calculation that

supf∈Fn

‖Wnfqknn −W?

nfqknn ‖p ≤ k1/p

n supf∈Fn

|Wnf −W?n,P f |. (S.170)

We establish the claim of the lemma by relying on (S.170) and applying Theorem S.9.1

to the class Fn. To this end, define dn = kn(jn + dw + 1) and let fdnn (v) ≡ qknn (z) ⊗(pjnn (s, y)′, q, w′)′. Set D1 ≡ (Q,W ′, pjnn (S, Y )′)′ and D2 = qknn (Z), and for eigD1D

′1

the largest eigenvalue of D1D′1, then note that we have

supP∈P‖eigD1D

′1‖P,∞ ≤ sup

P∈P‖traceD1D

′1‖P,∞ . jn, (S.171)

where the final inequality follows from Assumptions S.6.7(iii) and S.6.8(i). Hence, since

eigEP [qknn (Z)qknn (Z)′] is bounded uniformly in P ∈ P by Assumption S.6.9(ii), result

(S.171) and Lemma S.6.13 imply Assumption S.9.1(i) is satisfied with Cn jn. Similarly,

note that Assumptions S.6.7(iii), S.6.9(i), and S.6.8(i) imply Assumption S.9.1(ii) holds

with Kn =√jnkn. Moreover, Assumption S.9.2(i) is immediate with Gn,P equal to the

zero function and J1n = 0. Finally, note that any f ∈ Fn has the structure

f(v) = qk,n(z)(q − pjnn (s, y)′β − w′γ) for some (pjn′n β, γ) ∈ Θn. (S.172)

Therefore, for Bn as defined in Assumption S.9.2(ii), Cn ≡ β ∈ Rjn : ‖pjn′n β‖1,∞ ≤ C0,and Gn ≡ γ ∈ Rdw : ‖γ‖2 ≤ C0, we can conclude that

N(ε,Bn, ‖ · ‖2) ≤ kn ×N(ε

2,Gn, ‖ · ‖2)×N(

ε

2, Cn, ‖ · ‖2)

. kn × ((C0

ε)dw ∨ 1)×N(

ε

2, Cn, ‖ · ‖2), (S.173)

where in the second inequality we employed that N(ε,Gn, ‖ · ‖2) . (C0/ε)dw ∨ 1. Fur-

thermore, note that Assumption S.6.8(iii) implies that ‖β‖2 ‖pjn′n β‖P,2 uniformly in

n and P ∈ P, and hence since ‖pjn′n β‖P,2 ≤ ‖pjn′n β‖1,∞, the definition of Θn and (S.172)

implies that the radius of Bn under ‖ · ‖2 is uniformly bounded in n. Thus, the bound

in (S.173) yields that for some M <∞ we must have∫ ∞0

√log(N(ε,Bn, ‖ · ‖2))dε

.∫ M

0

√log(kn)dε+

∫ C0

0

√log(C0/ε)dε+

∫ M

0

√log(N(ε/2, Cn, ‖ · ‖2))dε

.√

log(kn) +

∫ ∞0

√log(N(u, Cn, ‖ · ‖2))du,

where the final inequality follows from N(ε, Cn, ‖ · ‖2) being (weakly) larger than one

for all ε and the change of variables u = ε/2. Hence, Assumption S.9.2(ii) holds with

53

J2n =√

log(kn) + En, and as a result Theorem S.9.1 implies uniformly in P ∈ P that

‖Wn −W?n,P ‖Fn = OP ((

√log(kn) + En)j

3nk

2n log(1 + jnkn)

n1/4). (S.174)

The claim of the lemma therefore follows from (S.170) and (S.174).

Lemma S.6.11. If B = C1B(Ω) ×Rdw and ΥG, ΥF , and Θ are as defined in (S.152),

(S.153), and (S.154), then it follows that Assumptions 3.11, 3.12, and 3.13 are satisfied

with Kg = 2, Kf = 0, and for any θ = (g, γ) and h = (gh, γh), ∇ΥG(θ)[h] equals

∇ΥG(θ)[h](s, y) =∂

∂sgh(s, y) + g(s, y)

∂

∂ygh(s, y) + gh(s, y)

∂

∂yg(s, y).

Proof: Recall that in this application G = C0B(Ω) and ‖θ‖B = max‖g‖1,∞, ‖γ‖2.

Hence, for any θ1 = (g1, γ1) ∈ B and θ2 = (g2, γ2) ∈ B we obtain that

‖ΥG(θ1)−ΥG(θ2)−∇ΥG(θ1)[θ1 − θ2]‖G

≤ sup(s,y)∈Ω

|g1(s, y)− g2(s, y)| × sup(s,y)∈Ω

| ∂∂y

(g1(s, y)− g2(s, y))| ≤ ‖g1 − g2‖21,∞, (S.175)

which verifies Assumption 3.11(i) holds with Kg = 2. Similarly, we additionally conclude

‖∇ΥG(θ1)−∇ΥG(θ2)‖o

= supgh:‖gh‖1,∞≤1

sup(s,y)∈Ω

|(g1(s, y)− g2(s, y))∂

∂ygh(s, y) + gh(s, y)

∂

∂y(g1(s, y)− g2(s, y))|

≤ 2‖g1 − g2‖1,∞, (S.176)

which verifies Assumption 3.11(ii) holds with Kg = 2 as well. Moreover, note that since

any θ = (g, γ) ∈ Θ satisfies ‖g‖1,∞ ≤ C0, it follows that ‖g‖1,∞ ≤ C0 + ε for any g ∈ Θε.

Thus, by identical arguments to those in (S.176) we obtain

‖∇ΥG(θ)‖o ≤ 2‖g‖1,∞ ≤ 2(C0 + ε), (S.177)

which thus verifies Assumption 3.11(iii) holds with Mg = 2(C0 + ε).

Next note ΥF : B→ F is affine and continuous, and hence ∇ΥF (θ)[h] = ΥF (h)− c0

for all θ, h ∈ B. Therefore, Assumptions 3.12(i)(ii) hold with Kf = 0, while

supgh:‖gh‖1,∞≤1

|gh(s0, y0)| ≤ 1 (S.178)

implies Assumption 3.12(iii) is satisfied with Mf = 1. Since ΥF being affine and Kf = 0

further imply that Assumptions 3.12(iv) and 3.13 hold, the lemma follows.

Lemma S.6.12. Let Assumptions S.6.7, S.6.8, S.6.9, S.6.10 hold, and an = (log(n))−1/2.

54

For any `n with knj2n log(1 + kn)/sn

√n = o(`n), it follows uniformly in P ∈ P0 that

Un(R|+∞) = infh√n∈Vn(θn,`n)

‖Wnρ(·, θn) + Dn[h]‖Σn,2 + oP (an)

Un(Θ|+∞) = inf‖ h√

n‖B≤`n

‖Wnρ(·, θun) + Dn[h]‖Σn,2 + oP (an).

Proof: We establish the lemma by applying Lemma 3.1. To this end, recall that in the

proof of Theorem S.6.3, Assumptions 3.3(i)(iii) and 3.4 were verified to hold with Bn √kn and Jn

√jn log(1 + jn). Since the eigenvalues of EP [pjnn (S, Y )pjnn (S, Y )′] are

bounded uniformly in P ∈ P by Assumption S.6.8(iii) and ‖θ‖E = supP∈P ‖g‖P,2 +‖γ‖2for any θ = (g, γ), it also follows that for any h = (pjn′n βh, γh) ∈ Bn we have

‖h‖E = supP∈P‖pjn′n βh‖P,2 + ‖γ‖2 . ‖βh‖2 + ‖γh‖2

.1

sn‖EP [qknn (Z)(pjnn (S, Y )′βh +W ′γh)]‖2 =

1

sn‖Dn,P [h]‖2,

where the second inequality holds by Assumption S.6.9(iii) and the final equality follows

from the definition of Dn,P [h]. Hence, since νn 1/sn by Lemma S.6.7 and p = 2, we

conclude the Lemma 3.1 requirement that ‖h‖E ≤ νn‖Dn,P (θ)[h]‖p for all θ ∈ An(P )

holds with An(P ) = Θn ∩ R (for either R = Θ or R corresponding to (S.152) and

(S.153)). Next, define the kn × (jn + dw) matrix Mi,n to be given by

Mi,n ≡1

nqknn (Zi)(p

jnn (Si, Yi)

′ W ′i )− EP [qknn (Z)(pjnn (S, Y )′ W ′)], (S.179)

which satisfies EP [Mi,n] = 0. Since ‖(pjn′n β, γ)‖E ‖β‖2+‖γ‖2 by Assumption S.6.8(iii),

we then conclude from (S.179) that for some C1 <∞ we must have

supP∈P

P ( suph∈Bn

‖Dn[h]− Dn,P [h]‖2‖h‖E

> sn) ≤ supP∈P

P (‖ 1

n

n∑i=1

Mi,n‖o > C1sn)

≤ (jn + kn) exp− ns2nC2

(k2n ∨ jn) + snkn

√jn = o(1), (S.180)

where the final inequality follows by applying Lemma S.6.29 with b1n =√jn (by As-

sumptions S.6.7(iii) and S.6.8(i)) and b2n = kn (by Assumption S.6.9(i)), while the final

equality results from log(kn)k2n/s

2nn = o(1) by Assumption S.6.9(iv) and kn ≥ jn by

Assumption S.6.8(iii). Hence, νn 1/sn and (S.180) imply condition (24) in Lemma

3.1 holds. Finally, we note that by Assumption S.6.11(iv), we may apply Lemma S.6.10

with p = 2 to conclude that Assumption 3.14 holds with R = Θ (and hence for R as cor-

responding to (S.152) and (S.153)) with an = (log(n))−12 . This concludes verifying the

requirements of Lemma 3.1 and therefore the present Lemma follows for any `n satisfying

Sn(B,E)Rn = o(`n), which in this application is equivalent to knj2n log(1 + kn)/sn

√n =

55

o(`n) due to Sn(B,E) . j3/2n and Rn kn

√jn log(1 + kn)/sn

√n.

Lemma S.6.13. Let D1 ∈ Rd1, D2 ∈ Rd2 be distributed according to Q, and for any

matrix A let eigA denote its largest eigenvalue. Then it follows that

eigEQ[(D1 ⊗D2)(D1 ⊗D2)′] ≤ ‖eigD1D′1‖Q,∞ × eigEQ[D2D

′2].

Proof: Let A ≡ aid1i=1 : ai ∈ Rd2 and∑d1

i=1 ‖ai‖22 ≤ 1, set (D(1)1 , . . . , D

(d1)1 ) =

D1 ∈ Rd1 , and then note that by direct calculation we obtain that

eigEQ[(D1 ⊗D2)(D1 ⊗D2)′]

= supai

d1i=1∈A

(a′1, . . . , a′d1)EQ[(D1 ⊗D2)(D1 ⊗D2)′](a′1, . . . , a

′d1)′

= supai

d1i=1∈A

EQ[(

d1∑i=1

(a′iD2)D(i)1 )2]

≤ ‖eigD1D′1‖Q,∞ sup

aid1i=1∈A

d1∑i=1

EQ[(a′iD2)2].

However, since∑d1

i=1 ‖ai‖22 ≤ 1 for all aid1i=1 ∈ A, we additionally have the inequality

supai

d1i=1∈A

d1∑i=1

EQ[(a′iD2)2] ≤ supai

d1i=1∈A

d1∑i=1

eigEQ[D2D′2]‖ai‖22 = eigEQ[D2D

′2],

and therefore the claim of the Lemma follows.

Lemma S.6.14. Let λ be the Lebesgue measure, B(1)b,n

j1nb=1 and B(2)

b,nj2nb=1 be B-splines

on [0, 1] of order r ≥ 3 with no interior knot multiplicity, mesh ratio bounded in n, and

‖ · ‖λ,2 normalized to have norm one. If pj,njnj=1 is the tensor product of B(1)b,n

j1nb=1 and

B(2)b,n

j2nb=1 and Cn ≡ β ∈ Rjn : ‖pjn′n β‖1,∞ ≤ C0, then it follows that∫ ∞

0

√log(N(ε, Cn, ‖ · ‖2))dε .

√j1n ∧ j2n log(jn + 1).

Proof: We rely heavily on Chapter 5 in DeVore and Lorentz (1993), and note that

Bj corresponds to Nj/‖Nj‖λ,2 in their notation. Throughout, for two sequences an and

bn we employ an bn to mean that there exist constants c and c such that can ≤bn ≤ can for all n. In what follows it will also prove convenient to index the elements

of β ∈ Rjn by βb1,b2 with 1 ≤ b1 ≤ j1n and 1 ≤ b2 ≤ j2n. Then note that the mesh

ratios corresponding to B(1)b,n

j1nb=1 and B(2)

b,nj2nb=1 being bounded uniformly in n and two

56

applications of Theorem 5.4.2 in DeVore and Lorentz (1993) imply that

‖pjn′n β‖∞ = supu1∈[0,1]

supu2∈[0,1]

|j2n∑b2=1

B(2)b2,n

(u2)

j1n∑b1=1

βb1,b2B(1)b1,n

(u1)|

supu1∈[0,1]

max1≤b2≤j2n

√j2n|

j1n∑b1=1

βb1,b2B(1)b1,n

(u1)| √j1nj2n‖β‖∞ (S.181)

uniformly in β ∈ Rjn . By similar arguments we also obtain uniformly in β ∈ Rjn that

supu1∈(0,1)

supu2∈[0,1]

|j2n∑b2=1

B(2)b2,n

(u2)

j1n∑b1=1

∂

∂u1βb1,b2B

(1)b1,n

(u1)|

supu1∈(0,1)

max1≤b2≤j2n

√j2n|

j1n∑b1=1

∂

∂u1βb1,b2B

(1)b1,n

(u1)|

max1≤b2≤j2n

max2≤b1≤j1n

√j2nj

3/21n |βb1,b2 − βb1−1,b2 |, (S.182)

where the second result follows by employing equation (3.11) and Theorem 5.4.2 in

Chapter 5 of DeVore and Lorentz (1993) and the mesh ratio of B(1)b,n

j1nb=1 being bounded.

Since by identical arguments we can also derive the symmetric (to (S.182)) relationship

supu1∈[0,1]

supu2∈(0,1)

|j1n∑b1=1

B(1)b1,n

(u1)

j2n∑b2=1

∂

∂u2βb1,b2B

(2)b2,n

(u2)|

max1≤b1≤j1n

max2≤b2≤j2n

√j1nj

3/22n |βb1,b2 − βb1,b2−1|, (S.183)

it follows from results (S.181), (S.182), and (S.183) that there is an M0 <∞ such that

max1≤b1≤j1n

max1≤b2≤j2n

|βb1,b2 | ≤M0/√jn

max1≤b2≤j2n

max2≤b1≤j1n

|βb1,b2 − βb1−1,b2 | ≤M0/(j1n√jn)

max1≤b1≤j1n

max2≤b2≤j2n

|βb1,b2 − βb1,b2−1| ≤M0/(j2n√jn) (S.184)

for all β ∈ Cn. Hence, in order to establish the claim of the lemma, it suffices to bound

the covering numbers for the set defined by (S.184).

We proceed by combining two bounds, one for “small” ε and one for “large” ε. First,

assume without loss of generality j1n ≥ j2n, let cn ≡ dlog(j1n + 1)e, and define the sets

ε

3√jnk1 ≤ βb1,b2 ≤

ε

3√jn

(k1 + 1) for all b1 = mcn + 1 with 0 ≤ m ≤ dj1n/cne − 1

ε

3cn√jnk2 ≤ βb1,b2 − βb1−1,b2 ≤

ε

3cn√jn

(k2 + 1) otherwise (S.185)

57

where k1, k2 are non-zero integers – i.e. the sets (in Rjn) defined in (S.185) consist of

“chains” along the b1 dimension that reset every cn integers. To compute the diameter

of the sets in (S.185), then note that since all “chains” have the same structure

sup ‖β − β‖22 s.t. β, β satisfying (S.185)

≤ sup j2ndj1ncne

cn∑b1=1

(βb1,j2n − βb1,j2n)2 s.t. β, β satisfying (S.185)

≤ j2ndj1ncne

cn∑b1=1

ε2

9jn1 +

2(b1 − 1)

cn2, (S.186)

where the final inequality follows from (S.185). Since dj1n/cnecn ≤ (j1n + cn) ≤ 2j1n

due to dj1n/cne ≤ 1 + j1n/cn and cn ≤ j1n, it follows from jn = j1nj2n that every set in

(S.185) is contained in a ball of radius ε. Moreover, by (S.184) the total number of sets

with the structure in (S.185) needed to cover the set Cn is bounded by

(d6M0

εe)j2nd

j1ncne(d6M0cn

εj1ne)j2nd

j1ncnecn . (S.187)

Next, we employ again the bound dj1n/cnecn ≤ 2j1n and dae ≤ 2a whenever a ≥ 1, to

obtain from (S.186) and (S.187) that whenever ε ≤ 6M0cn/j1n we have

N(ε, Cn, ‖ · ‖2) ≤ (12M0

ε)2jncn (

12M0cnεj1n

)2jn

= (12M0cnεj1n

(j1ncn

)1cn )

2jn(cn+1)cn ≤ (

M1 log(1 + j1n)

εj1n)4jn , (S.188)

where the final equality holds for someM1 <∞ due to (cn+1)/cn ≤ 2 and (j1n/cn)1/cn ≤

j1

log(1+j1n)

1n = O(1) because cn = dlog(1 + j1n)e.

The bound in (S.188) is valid only for ε ≤ 6M0cn/j1n. To obtain a bound for

ε ≥ 6M0cn/j1n, let Zb1,b2b1,b2 be independent standard normal random variables. By

Sudakov’s inquality (see, e.g., Proposition A.2.5 in van der Vaart and Wellner (1996)),

it then follows that for some M2 <∞ independent of n we have that

√log(N(ε, Cn, ‖ · ‖2)) ≤ M2

εE[ sup

β∈Cn

j1n∑b1=1

j2n∑b2=1

βb1,b2Zb1,b2 ]. (S.189)

Next, for notational convenience define ∆b1βb1,b2 = (βb1,b2 − βb1−1,b2) and ∆b2βb1,b2 =

58

(βb1,b2 − βb1,b2−1), and then note that by (S.184) it follows that

supβ∈Cn

j1n∑b1=1

j2n∑b2=1

βb1,b2Zb1,b2

= supβ∈Cn

j2n∑b2=1

j1n∑b1=2

∆b1βb1,b2

j1n∑b1=b1

Zb1,b2 +

j2n∑b2=2

∆b2β1,b2

j1n∑b1=1

j2n∑b2=b2

Zb1,b2 + β1,1

j1n∑b1=1

j2n∑b2=1

Zb1,b2

≤j2n∑b2=1

j1n∑b1=2

M0

j1n√jn|j1n∑b1=b1

Zb1,b2 |+j2n∑b2=2

M0

j2n√jn|j1n∑b1=1

j2n∑b2=b2

Zb1,b2 |+M0√jn|j1n∑b1=1

j2n∑b2=1

Zb1,b2 |

Hence, employing that if W ∼ N(0, σ2) then E[|W|] . σ, we can conclude that

E[ supβ∈Cn

j1n∑b1=1

j2n∑b2=1

βb1,b2Zb1,b2 ] .j2n∑b2=1

j1n∑b1=2

√j1n − b1j1n√jn

+

j2n∑b2=2

√j1n(j2n − b2)

j2n√jn

+ 1

≤ jn√j1n

j1n√jn

+j2n√j1nj2n

j2n√jn

+ 1 ≤ 3√j2n

where in the final inequality we employed that jn = j1nj2n. Hence, by (S.189) we have

√log(N(ε, Cn, ‖ · ‖2)) .

√j2nε

. (S.190)

To conclude the proof, we combine the bounds in (S.188) and (S.190). In particular,

setting δn ≡ 6M0dlog(j1n + 1)e/j1n and observing that ‖β‖2 ‖pjn′n β‖λ,2 ≤ C0 for all

β ∈ Cn allows us to conclude that for some M2 <∞ we must have

∫ ∞0

√log(N(ε, Cn, ‖ · ‖2))dε .

∫ M2

δn

√j2nε

+√jn

∫ δn

0(log(

M1 log(j1n)

εj1n))1/2dε

≤√j2n log(1 + j1n) +

√jn log(1 + j1n)

j1n

∫ 1

0(log(

1

u))1/2du .

√j2n log(1 + j1n)

where the second inequality follows from the change of variables u = ε/δn and the final

inequality employed that jn = j1nj2n and j2n ≤ j1n. Substituting j2n = j1n ∧ j2n and

employing j1n ≤ jn establishes the Lemma.

S.6.3 Quantile Treatment Effects

For our next example we formally study the nonparametric quantile treatment effect

(QTE) application introduced in Section 2.2. Recall that in this context θ0(P ) is as-

sumed to be the solution to the conditional moment restriction

P (Y ≤ θ0(P )(D)|Z) = τ (S.191)

59

where Y ∈ R, D ∈ [0, 1], and Z ∈ R. Thus, in this application X = (Y,D,Z) ∈ X ≡R× [0, 1]×R and the residual function ρ : X×B→ R corresponding to (S.191) is

ρ(X, θ) = 1Y ≤ θ(D) − τ. (S.192)

Our analysis readily enables us to conduct inference on the QTE itself. However, in

order to illustrate our conditions in a number of different settings, we focus here on a

nonlinear functional of θ0(P ). In particular, we conduct inference on∫ 1

0(∇θ0(P )(u))2du− (

∫ 1

0∇θ0(P )(u)du)2

while imposing that the QTE be increasing in treatment intensity (i.e. d 7→ ∇θ0(P )(d)

is increasing). To map this problem into our framework we define

CmB ([0, 1]) ≡ θ : [0, 1]→ R s.t. ‖θ‖m,∞ <∞ ‖θ‖m,∞ ≡ sup0≤α≤m

supd∈[0,1]

|∇αθ(d)|,

and set B = C2B([0, 1]) and ‖ · ‖B = ‖ · ‖2,∞. We impose the restriction that the quantile

treatment effect be increasing in the intensity of treatment by letting G = C0B([0, 1]),

‖ · ‖G = ‖ · ‖∞ (where we write ‖ · ‖∞ in place of ‖ · ‖0,∞), and defining

ΥG(θ) ≡ −∇2θ. (S.193)

As already argued in Section S.6.2, we note that G is a Banach lattice with order unit

1G = c for c the constant function c(d) = 1 for all d ∈ [0, 1]. Setting F = R with

‖ · ‖F = | · |, we may then test whether the variance of the quantile treatment effects

equals a hypothesized value λ 6= 0 by setting ΥF : B→ R to equal

ΥF (θ) =

∫ 1

0(∇θ(u))2du− (

∫ 1

0∇θ(u)du)2 − λ. (S.194)

For the parameter space for θ0(P ) we employ a ball in B and we thus set Θ to equal

Θ ≡ θ ∈ C2B([0, 1]) s.t. ‖θ‖2,∞ ≤ C0 (S.195)

for some C0 <∞. For a sequence of approximating functions pj,njnj=1 defined on [0, 1]

we then let pjnn (d) ≡ (p1,n(d), . . . , pjn,n(d))′ and define Θn to equal

Θn ≡ pjn′n β ∈ C2B([0, 1]) : ‖pjn′n β‖2,∞ ≤ C0. (S.196)

Similarly employing a sequence of transformations qn,kknk=1 of the conditioning variables

60

Z, we set qknn (z) ≡ (q1,n(z), . . . , qkn,n(z))′ and for some 2 ≤ p ≤ ∞ we let

Qn(θ) ≡ ‖Σn1

n

n∑i=1

(1Yi ≤ θ(Di) − τ)qknn (Zi)‖p

for some weighting matrix Σn. For R = θ ∈ Θ : ΥF (θ) = 0 and ΥG(θ) ≤ 0 with ΥG

and ΥF as defined in (S.193) and (S.194), we then set In(R) and In(Θ) to equal


√nQn(θ) In(Θ) ≡ inf

θ∈Θn

√nQn(θ).

In order to approximate the finite sample distribution of In(R) and In(Θ) we impose:

Assumption S.6.12. (i) Yi, Di, Zini=1 is i.i.d. with (Y,D,Z) ∈ R × [0, 1] × R dis-

tributed according to P ∈ P; (ii) For Θ as in (S.195) and each P ∈ P0 there exists a

unique θ0(P ) ∈ Θ satisfying (S.191); (iii) The distribution of Y conditional on (D,Z)

is absolutely continuous with density fY |DZ,P (·|D,Z) that is bounded and Lipschitz uni-

formly in (D,Z) and P ∈ P; (iv) Assumptions S.8.1 and S.8.2 hold.

Assumption S.6.13. (i) supd ‖pjnn (d)‖2 .

√jn; (ii) EP [pjnn (D)pjnn (D)′] has eigenvalues

bounded away from zero and infinity uniformly in P ∈ P and n; (iii) For each P ∈P0 there is a Πnθ0(P ) ∈ Θn ∩ R with supP∈P0

‖EP [(1Y ≤ Πnθ0(P )(D) − 1Y ≤θ0(P )(D))qknn (Z)]‖p = O((n log(n))−1/2).

Assumption S.6.14. (i) infP∈P0 infθ∈Θ:‖θ−θ0(P )‖1,∞≥εEP [(P (Y ≤ θ0(P )(D)|Z)−τ)2] >

0 for every ε > 0; (ii) There are ε and sn > 0 satisfying for all P ∈ P0 and ‖θ −Πnθ0(P )‖1,∞ ≤ ε, sn ≤ singEP [fY |D,Z(θ(D)|D,Z)qknn (Z)pjnn (D)′] and sn = O(1).

Assumption S.6.15. (i) max1≤k≤kn ‖qk,n‖∞ = O(1); (ii) max1≤k≤kn ‖qk,n‖1,∞ = O(kn);

(iii) EP [qknn (Z)qknn (Z)′] has eigenvalues bounded away from zero and infinity uniformly

in P ∈ P and n; (iv) For each θ ∈ Θ there is a πn(θ) ∈ Rkn such that EP [(EP [ρ(X, θ)|Z]−qknn (Z)′πn(θ))2] = o(1) uniformly in P ∈ P and θ ∈ Θ; (v) k

1/pn√jn log3/2(n)(n1/6 ∨

kn)/n1/3 = o(1) and jn log5/2(1 + kn)k2/p+1/2n /sn

√n = o((log(n))−1).

Assumption S.6.16. (i) ‖Σn − Σn(P )‖o,p = oP ((k1/pn log(n))−1) uniformly in P ∈ P;

(ii) Σn(P ) is invertible and ‖Σn(P )‖o,p and ‖Σn(P )−1‖o,p are bounded in P ∈ P and n.

Besides imposing identification of θ0(P ), Assumption S.6.12 imposes regularity con-

ditions on the distribution P that enable us to apply the empirical process coupling

results of Appendix S.8. Assumption S.6.13 states the requirements on the sieve Θn,

including demanding an asymptotically negligible bias in Assumption S.6.13(iii). As-

sumption S.6.13(i) holds pointwise in P ∈ P0 due to Θ being compact under ‖ · ‖1,∞,

and hence the uniformity in P ∈ P0 demanded by Assumption S.6.13(i) corresponds to

imposing strong identification. While Assumption S.6.13(i) delivers a uniform consis-

tency result, Assumption S.6.13(ii) enables us to obtain a uniform rate of convergence

61

under the norm ‖ · ‖E = supP∈P0‖ · ‖P,2. As in Section S.6.2, sn can be shown to

be related to the degree of ill-posedness; see Remarks S.6.1 and S.6.2. Assumptions

S.6.15(i)-(iv) impose conditions on the transformed moments qk,nnk=1 including that

they be bounded – this requirement can be relaxed at the cost of more stringent rate

restrictions to ensure a coupling of the empirical process (see Lemma S.6.19). Finally,

Assumption S.6.15(v) states our rate restrictions, which we note are easier to satisfy for

higher values of p.

In this application we set ‖·‖E = supP∈P0‖·‖P,2. Hence, for any θ = pjn′n β ∈ Θn∩R,

the local parameter space Vn(θ, `) is in this application given by

Vn(θ, `) =pjn′n βh : ‖θ +

pjn′n βh√n‖2,∞ ≤ C0, sup

P∈P0

‖pjn′n βh‖P,2 ≤ `√n,∫ 1

0(∇θ(u) +

∇pjnn (u)′βh√n

)2du− (

∫ 1

0∇θ(u) +

∇pjnn (u)′βh√n

du)2 = λ,

−∇2θ(d)− ∇2pjnn (d)′βh√

n≤ 0 for all d ∈ [0, 1]

, (S.197)

where the first two constraints impose that θ+ pjn′n βh/√n ∈ Θn and ‖pjn′n βh/

√n‖E ≤ `,

while the final two constraints require that θ + pjn′n βh/√n ∈ R. Similarly, here

V un (θ, `) =

pjn′n βh : ‖θ +

pjn′n βh√n‖2,∞ ≤ C0 and sup

P∈P0

‖pjn′n βh‖P,2 ≤ `√n,

which corresponds to dropping the constraint θ+pjn′n βh/√n ∈ R in (S.197). We note that

in this problem the residual function ρ(·, θ) is not smooth in θ. Nonetheless, Assumption

3.8 holds (see Lemma S.6.20) and Dn,P (θ) is for any h = pjn′n βh ∈ Bn equal to

Dn,P (θ)[h] = EP [qknn (Z)fY |DZ,P (θ(D)|D,Z)pjnn (D)′βh]. (S.198)

Our next result employs Theorem 3.2 to derive approximations to the finite sample

distribution of the test statistics In(R) and In(R)− In(Θ).

Theorem S.6.5. Let Assumptions S.6.12, S.6.13, S.6.14, S.6.15, and S.6.16 hold, an =

(log(n))−1/2, and `n ↓ 0 satisfy k1/pn

√jn`n log(1 + kn) log(1/`n) = o((log(n))−1/2) and

`2n√njn log(n) = o(1). Then: uniformly in P ∈ P0 it follows that

In(R) ≤ Un,P (R|`n) + oP (an).

If in addition kn log(1 + kn)√jn log(n)/s2

n

√n = o(1), then for any ùn ↓ 0 satisfy-

ing k1/pn

√jnùn log(1 + kn) log(1/ùn) = o((log(n))−1/2), (ùn)2

√njn log(n) = o(1), and

62

√kn log(1 + kn)/sn

√n = o(ùn), it follows uniformly in P ∈ P0 that

In(R)− In(Θ) ≤ Un,P (R|`n)− Un,P (Θ|ùn) + oP (an).

The first part of Theorem S.6.5 obtains an upper bound for In(R) by relying on

Theorem 3.2(i). In order to approximate the re-centered statistic In(R) − In(Θ), we

cannot rely on an upper bound for In(Θ) as the resulting approximation could fail to

control size. Therefore, the second part of Theorem S.6.5 instead relies on Theorem

3.2(ii) to obtain a valid approximation. Applying Theorem 3.2(ii), however, requires an

additional rate condition in order to establish the linearization of the moment conditions

is asymptotically valid – i.e. we impose R2n × Sn(L,E) = o(ann

−1/2). Finally, we also

note that the second conclusion of Theorem S.6.5 in fact holds with equality whenever

`n satisfies the same rate restrictions as ùn.

In order to provide bootstrap estimates for these distributional approximations, let

θn ∈ arg minθ∈Θn∩R

Qn(θ) θun ∈ arg min

θ∈ΘnQn(θ).

Recalling that ρ(·, θ) is as defined in (S.192), then note that Wn is here given by

Wnρ(·, θ)qknn ≡1√n

n∑i=1

ωiqknn (Zi)(1Yi ≤ θ(Di)−τ)− 1

n

n∑j=1

qknn (Zj)(1Yj ≤ θ(Dj)−τ).

Since ρ(·, θ) is not smooth in θ, we estimate Dn,P (θ)[h] through the numerical derivative

Dn(θ)[h] ≡ 1√n

n∑i=1

qknn (Zi)(1Yi ≤ θ(Di) +h(Di)√

n − 1Yi ≤ θ(Di)).

An unappealing feature of Dn(θ) is that it is not linear in h and thus may complicate

computation of the bootstrap statistic. We note that alternatively, a plug-in estimator

that employs the analytical expression in (S.198) could be used – such an estimator,

however, would necessitate estimating the density function fY |DZ,P and thus require the

introduction of additional bandwidths.

With regards to the local parameter space, we note that in this application the

function ΥG is linear. Therefore, for this problem the set Gn(θn) is simply given by

Gn(θn) ≡ h√n∈ Bn : −∇2θn(d)− ∇

2h(d)√n≤ max−∇2θn(d) ∨ −rn for all d ∈ [0, 1].

Employing that ‖ · ‖B = ‖ · ‖2,∞ and the expression for ΥF in (S.194), the estimate for

63

the local parameter space Vn(Πnθ0(P ), `n) in this application given by

Vn(θn, `n) = h√

n∈ Bn :

h√n∈ Gn(θn), ‖ h√

n‖2,∞ ≤ `n∫ 1

0(∇θn(u) +

∇h(u)√n

)2du− (

∫ 1

0∇(θn(u) +

h(u)√n

)du)2 = λ,

where `n is chosen to satisfy conditions stated below. The bootstrap statistics Un(R|`n)

and Un(Θ|ùn) for approximating the distributions in Theorem S.6.5 are then

Un(R|`n) ≡ infh√n∈Vn(θn,`n)

‖Wnρ(·, θn) + Dn(θn)[h]‖Σn,p

Un(Θ|ùn) ≡ inf‖ h√

n‖2,∞≤ùn


n)[h]‖Σn,p.

The following final assumption will enable us to establish bootstrap validity.

Assumption S.6.17. (i) The functions θ(d) = 1, θ(d) = d2 are in Bn; (ii) ‖θ0(P ) −Πnθ0(P )‖2,∞ = o(1) uniformly in P ∈ P0 and supP∈P0

‖θ0(P )‖2,∞ < C0; (iii) kn

satisfies k1/p+9/20n = o(n1/20/ log(n)); (iv) supd ‖∇2pjnn (d)‖2 ∨ ‖∇pjnn (d)‖2 . j

5/2n ; (v) rn

and ñ = `n ∨ ùn satisfy k

1/pn

√jn ˜

n log(1 + kn) log(1/ñ) = o((log(n))−1/2), ˜

n(√n(˜

n +

j5/2n

√kn log(1 + kn)/sn) = o((log(n))−1/2), and j

5/2n


√n = o(1 ∧ rn).

Assumption S.6.17(i) simply demands that the quadratic functions belong to Bn – a

condition we employ to verify Assumption 3.13. In turn Assumption S.6.17(ii) implies

that θ0(P ) and its approximation Πnθ0(P ) belong to the interior of Θ. Assumption

S.6.17(iii) imposes a sufficient condition for verifying the bootstrap coupling requirement

of Assumption 3.14. In particular, we establish Assumption 3.14 holds by applying the

results in Appendix S.9 to a Haar basis expansion. While condition S.6.17(iii) suffices

for verifying Assumption 3.14 in both the endogenous (Z 6= D) and exogenous (Z = D)

settings, we note in both cases the rate condition of Assumption S.6.17(iii) could be

improved.1 Finally, Assumption S.6.17(iv), which is satisfied for instance by B-Splines,

ensures that Sn(B,E) j5/2n , while Assumption S.6.17(v) imposes the rate requirements

on `n, ùn and rn. Intuitively, these rate requirements demand that `n ↓ 0 sufficiently fast

and rn not tend to zero too quickly.

The next theorem establishes the validity of the bootstrap procedure.

Theorem S.6.6. Let Assumptions S.6.12, S.6.13, S.6.14, S.6.15, S.6.16, and S.6.17

hold and an = (log(n))−1/2. Then, it follows that there are sequences ñ and ˜u

n such

1For instance under endogeneity, a better rate could be obtained by conducting a basis expansionusing the tensor product of a Haar Basis for (Y,D) and the functions qk,nknk=1.

64

that `n ñ, ùn ˜u

n and uniformly in P ∈ P0 we have

Un(R|`n) ≥ U?n,P (R|ñ) + oP (an)

Un(R|`n)− Un(Θ|ùn) ≥ U?n,P (R|ñ)− U?n,P (Θ|˜un) + oP (an).

Thus, Theorems S.6.5 and S.6.6 imply that as critical value for In(R) we may employ

q1−α(Un(R|`n)) ≡ infc : P (Un(R|`n) ≤ c|Vini=1) ≥ 1− α.

If in addition kn log(1+kn)√jn log(n)/s2

n

√n = o(1) and


√n = o(ùn)

(so that the second conclusion of Theorem S.6.5 holds), then Theorem S.6.6 implies a

valid test can be obtained by rejecting whenever In(R)−In(Θ) exceeds the critical value

q1−α(Un(R|`n)− Un(Θ|ùn)) ≡ infc : P (Un(R|`n)− Un(Θ|˜un) ≤ c|Vini=1) ≥ 1− α.

Remark S.6.3. To illustrate the role of `n, it is helpful to conduct a pointwise (in

P ) analysis, set p = 2, and connect our assumptions to the literature on estimation of

conditional moment restriction models (Chen and Pouzo, 2012). We follow the literature

in imposing a local curvature assumption, which in our application, corresponds to

‖EP [(P (Y ≤ h(D)|Z)− τ)qknn (Z)]‖2 ‖EP [fY |DZ,P (θ(D)|D,Z)(θ(P )(D)− h(D))qknn (Z)]‖2 (S.199)

for all h ∈ Θn and θ ∈ Θ that are in a neighborhood of θ(P ). We further suppose the

linear operator h 7→ EP [fY |DZ,P (θ(P )(D)|D,Z)h(D)|Z] is compact, in which case there

exist orthonormal bases ψj and φk and a sequence λj ↓ 0 satisfying

EP [fY |DZ,P (θ(P )(D)|D,Z)φj(D)|Z] = λjψj(Z). (S.200)

Setting kn ≥ jn with kn − jn = O(1), pjn = (φ1, . . . , φjn)′, qkn = (ψ1, . . . , ψkn)′, and

Πunθ(P ) =

∑jnj=1 φjβj , we also suppose θ(P ) admits an expansion

θ(P ) =

∞∑j=1

pjβj with |βj | = O(j−γβ ). (S.201)

Provided that the approximation error of Πnθ0(P ) (as in Assumption S.6.13(iii)) and

Πunθ0(P ) are of the same order, it then follows from (S.199) and (S.200) that

‖EP [(1Y ≤ Πnθ0(P )(D) − 1Y ≤ θ0(P )(D))qkn(Z)]‖2 .λjnjγβn

(S.202)

and sn λjn – i.e. sn is proportional to the reciprocal of the sieve measure of ill-

posedness (Chen and Pouzo, 2012). As a result, if λj j−γλ and γβ > max5/2, 3 −

65

γλ, then Theorem S.6.5 may be applied to couple In(R) by setting jn nκ with

(2(γλ + γβ))−1 < κ < min(5 + 2γλ)−1, 1/6, while coupling In(R)− In(Θ) additionally

requires γβ > 3/2 + γλ and κ < (3 + 4γλ)−1. In contrast, in the severely ill-posed case

in which λj exp−γλj, the conditions of Theorem S.6.5 for coupling In(R)− In(Θ)

are not satisfied. However, the conditions for coupling In(R) can still be met provided

γβ > 4 by setting jn = (log(n)−κ(log(log(n))))/2γλ with 7 < κ < 2γβ−1. Heuristically,

while in the ill-posed case the rate of convergence is too slow to apply Theorem 3.2(ii)

to In(R), Theorem 3.2(i) is nonetheless able to deliver a coupling upper bound for In(R)

under suitable sequences `n.


Proof of Theorem S.6.5: We establish the result by applying Theorem 3.2 to

both R = Θ and R corresponding to (S.193) and (S.194). To this end, note that

Assumption 3.2 is directly imposed in Assumption S.6.12(i), Assumption 3.3(i) holds

with Bn = O(1) by Assumption S.6.15(i), Assumption 3.3(ii) is directly imposed by

Assumption S.6.15(iii), and Assumption 3.3(iii) holds with Fn = 1 and Jn = O(1)

by Lemma S.6.18. We also note Assumption 3.4 is implied by Assumption S.6.16,

while Lemma S.6.16 verifies Assumption 3.6 for both R = Θ and R corresponding to

(S.193) and (S.194). Next, we apply Lemma S.6.19 with π0n = O(1) and π1n = O(kn)

(which is possible by Assumptions S.6.15(i)(ii)) to obtain that Assumption 3.7 holds

for both R = Θ and R corresponding to (S.193) and (S.194) for any an satisfying

k1/pn√jn log(n)(n1/6 ∨ kn)/n1/3 = o(an) and in particular it holds for an (log(n))−1/2

by Assumption S.6.15(v). We also note Assumptions 3.8 and 3.9 hold with ‖·‖L = ‖·‖∞,

‖ · ‖E = supP∈P0‖ · ‖P,2, and κρ = 1/2 by Lemma S.6.20. Finally, to verify Assumption

3.10, note Jn = O(1), Bn = O(1), and νn √kn/snk

1/pn by Lemma S.6.16 imply in this

application we have Rn √kn log(1 + kn)/sn

√n. Thus, Lemma S.6.18 and Assump-

tion S.6.15(v) verifies Assumption 3.10(i), Assumption S.6.13(iii) implies Assumption

3.10(ii), and Assumption S.6.16(i) implies Assumption 3.10(iii). Finally, since in this

application ‖ · ‖L = ‖ · ‖∞ by Lemma S.6.20, Assumptions S.6.13(i)(ii) yield

supβ

‖pjn′n β‖∞supP∈P0

‖pjn′n β‖P,2.

√jn‖β‖2‖β‖2

=√jn. (S.203)

Therefore, the conditionKm`2n×Sn(L,E) = o(ann

−1/2) is equivalent to `2n√njn log(n) =

o(1). Moreover, by Lemma S.6.18, Assumption S.6.13(ii), κρ = 1/2, and Bn = O(1) the

condition k1/pn


κρn ,Fn, ‖ · ‖P,2) = o(an) is implied by the

restriction k1/pn

√jn`n log(1 + kn) log(1/`n) = o((log(n))−1/2). Thus, the first claim of

the theorem follows form Theorem 3.2(i) applied to In(R).

The second claim of the Theorem follows from applying Theorem 3.2(ii) to both

66

In(Θ). To this end, note that the only remaining condition to verify is that R2n ×

Sn(L,E) = o(ann−1/2). Using that, as already argued, Rn


√n

and result (S.203) we note that a sufficient condition for this final requirement is that

kn log(1 + kn)√jn log(n)/s2

n

√n = o(1) and therefore the theorem follows.

Proof of Theorem S.6.6: We proceed by relying on Theorem 3.3 and Corollary 3.3.

To this end, we note that, for both R = Θ and R corresponding to (S.193) and (S.194),

the proof of Theorem S.6.5 established that Assumptions 3.2, 3.3, 3.4, 3.6, 3.7, 3.8, 3.9,

and 3.10 are satisfied with Bn = O(1), Jn = O(1), ‖·‖E = supP∈P0‖·‖P,2, ‖·‖L = ‖·‖∞,

νn √kn/snk

1/pn , Rn


√n, κρ = 1/2, and Sn(L,E) .

√jn.

Next, note that Assumptions 3.11, 3.12, and 3.13 are satisfied by Lemma S.6.21

with Kg = 0 and Kf > 0. To verify Assumption 3.14 we apply Lemma S.6.22 with

π0n = O(1) and π1n . kn, which is possible by Assumptions S.6.15(i)(ii). In particular,

Lemma S.6.22 evaluated at dn (nkn)3/10 and Assumption S.6.17(iii) yield

supθ∈Θn

‖Wnρ(·, θ)qknn −W?n,Pρ(·, θ)qknn ‖p = oP ((log(n))−1/2) (S.204)

uniformly in P ∈ P, which implies Assumption 3.14 is satisfied for both R = Θ and R

corresponding to (S.193) and (S.194). Next note that Assumption 3.15(i) is immediate

since ‖·‖E = supP∈P0‖·‖P,2 and ‖·‖B = ‖·‖2,∞. In order to verify the rate conditions of

Assumption 3.16, note that the eigenvalues of EP [pjnn (D)pjnn (D)′] being bounded away

from zero uniformly in P ∈ P by Assumption S.6.13(ii), Assumptions S.6.13(i) and

S.6.17(ii) together with ‖ · ‖B = ‖ · ‖2,∞ and ‖ · ‖E = supP∈P0‖ · ‖P,2 yield

Sn(B,E) = supβ

‖pjn′n β‖2,∞supP∈P0

‖pjn′n β‖P,2.j

5/2n ‖β‖2‖β‖2

= j5/2n . (S.205)

Thus, we note Assumption 3.16(i) follows from Assumption S.6.17(v), result (S.205) and

Rn √kn log(1 + kn)/sn

√n. Furthermore, we note RnSn(B,E) = o(1), Assumption

S.6.17(ii), and the definition of Θ in (S.195) imply assumption 3.15(ii) is satisfied as well.

Assumption 3.16(ii) similarly follows from Assumption S.6.17(v), result (S.205), Rn √kn log(1 + kn)/sn

√n, κρ = 1/2, and Lemma S.6.18. Finally, Assumption 3.16(iii) also

follows by Assumption S.6.17(v), result (S.205), Rn √kn(log(1 + kn))/sn

√n, and

Kg = 0. We have thus verified the conditions of Theorem 3.3 and Corollary 3.3, and

the theorem follows.

Lemma S.6.15. Let Assumptions S.6.12(i)(iii), S.6.13(ii)(iii), S.6.14(i), S.6.15(i)(iii)(iv),

S.6.16 hold, and for either R = Θ or R corresponding to (S.193) and (S.194) set

Θrn = arg min

θ∈Θn∩RQn(θ).

If kn log(1 + kn)/√n = o(1), then

−→d H(Θr

n,Θr0n, ‖ · ‖1,∞) = oP (1) uniformly in P ∈ P0.

67

Proof: We establish the result by verifying the conditions of Lemma S.3.1. To this

end, note that Assumption 3.2 is directly imposed in Assumption S.6.12(i), Assumption

3.3(i) holds with Bn = O(1) by Assumption S.6.15(i), Assumption 3.3(i) holds with

Fn = 1 and Jn = O(1) by Lemma S.6.18, Assumption 3.4 is implied by Assumption

S.6.16, and Assumption 3.6(ii) follows from Assumption S.6.13(iii). Next, define

Vn(P ) ≡ θ ∈ Θn : ‖θ − θ0(P )‖1,∞ ≤ ε (S.206)

for any ε > 0 and P ∈ P0. Since for any a ∈ Rkn we have ‖a‖p ≤ ‖Σ−1n (P )‖o,p‖Σn(P )a‖p

and ‖Σ−1n (P )‖o,p is bounded uniformly in n and P ∈ P by Assumption S.6.16(ii), we

obtain from Lemma S.3.5 and Assumption S.6.13(iii) that

Sn(ε) ≡ infP∈P0


Qn,P (θ)− infθ∈Θn∩R

Qn,P (θ)

& infP∈P0


k1/pn√kn‖EP [qknn (Z)ρ(X, θ)]‖2 +O((n log(n))−1/2). (S.207)

We further note that the eigenvalues of EP [qknn (Z)qknn (Z)′] being bounded uniformly in

n and P ∈ P together with Lemma S.4.5 and Assumption S.6.15(iv) yield

‖EP [qknn (Z)(EP [ρ(X, θ)|Z]− qknn (Z)′πn(θ))]‖22. EP [(EP [ρ(X, θ)|Z]− qknn (Z)′πn(θ))2] = o(1). (S.208)

Thus, since the eigenvalues of (EP [qknn (Z)qknn (Z)′])1/2 are bounded away from zero by

Assumption S.6.15(iii), we obtain from result (S.208) that

infP∈P0


‖EP [qknn (Z)ρ(X, θ)]‖2

≥ infP∈P0


‖EP [qknn (Z)qknn (Z)′πn(θ)]‖2 + o(1)

& infP∈P0


(EP [(qknn (Z)′πn(θ))2])1/2 + o(1). (S.209)

Therefore, combining results (S.207) and (S.209), Assumption S.6.15(iv), and the set

inclusion Θn ∩R \ Vn(P ) ⊆ θ ∈ Θ : ‖θ − θ0(P )‖1,∞ ≥ ε, we can conclude

Sn(ε) & infP∈P0

infθ∈Θ:‖θ−θ0(P )‖1,∞≥ε

k1/pn√kn

(EP [(P (Y ≤ θ(D)|Z)−τ)2])1/2 +o(k

1/pn√kn

). (S.210)

Since Jn 1 and Bn 1, Assumption S.6.14(i), result (S.210) and kn log(1 +kn)/√n =

o(1) by hypothesis imply that k1/pn

√log(1 + kn)JnBn/

√n = o(Sn(ε)) as required by

Lemma S.3.1 and thus the claim of the Lemma follows.

Lemma S.6.16. Let Assumptions S.6.12(i)(iii), S.6.13(ii)(iii), S.6.14, S.6.15(i)(iii)(iv),

S.6.16 hold. For both R = Θ and R corresponding to (S.193) and (S.194), it then follows

68

that Assumption 3.6 is satisfied with ‖ · ‖E ≡ supP∈P0‖ · ‖P,2 and νn

√kn/snk

1/pn .

Proof: For R = Θ or R corresponding to (S.193) and (S.194), we define Θrn to equal

Θrn ≡ arg min

θ∈Θn∩RQn(θ),

which note corresponds to setting τn = 0 in (11). For any ε > 0, we may then let

Vn(P ) ≡ θ ∈ Θn ∩ R : ‖θ − Πnθ0(P )‖1,∞ ≤ ε, and note Lemma S.6.15 implies

Θrn ⊆ Vn(P ) with probability tending to one uniformly in P ∈ P0. Further observe

that since Πnθ0(P ) ∈ Θn, there exists a β0n such that Πnθ0(P ) = pjn′n β0n. For any

θ = pjn′n β ∈ Vn(P ), it then follows by Assumptions S.6.13(ii) and S.6.14(ii) that

supP∈P‖pjn′n (β − β0n)‖P,2 . ‖β − β0n‖2

≤ 1

sn× infP∈P0

inf‖θ−Πnθ0(P )‖1,∞≤ε

‖EP [fY |DZ,P (θ(D)|D,Z)qknn (Z)pjnn (D)′(βn − β0n)]‖2.

Hence, since ‖ ·‖E = supP∈P0‖ ·‖P,2, the mean value theorem allows us to conclude that

snk1/pn√kn‖pjn′n (β − β0n)‖E

.k

1/pn√kn‖EP [(P (Y ≤ pjnn (D)′β|D,Z)− P (Y ≤ pjnn (D)′β0n|D,Z))qknn (Z)]‖2

. Qn,P (θ)−Qn,P (Πnθ0(P )) +O((n log(n))−1/2) (S.211)

for any θ = pjn′n β ∈ Vn(P ) and P ∈ P0, and where the final inequality follows from

Lemma S.3.5, ‖a‖p ≤ ‖Σn(P )−1‖o,p‖Σn(P )a‖o,p for any a ∈ Rkn , and Assumptions

S.6.13(iii) and S.6.16(ii). Hence, (S.211) verifies Assumption 3.6(i) is satisfied with

νn √kn/snk

1/pn . Moreover, since Jn = O(1) by Lemma S.6.18 and Bn = O(1) by

Assumption S.6.15(i), Assumptions S.6.13(iii) and S.6.16(i) imply Assumption 3.6(ii) is

satisfied as well.

Lemma S.6.17. Let Assumption S.6.12(iii) hold. If f(y, d) = 1y ≤ θ(d)−τ for some

θ ∈ Θ (Θ as in (S.195)) and z 7→ q(z) is differentiable with bounded level and derivative,

then there exists a K <∞ independent of f , such that for all P ∈ P we have

$(fq, h, P ) ≤ K × ‖q‖∞√h+ ‖q‖1,∞h.

Proof: First note that since ‖f‖∞ ≤ 1, we can obtain by direct calculation and the

definition of integral modulus of continuity in (S.309) the upper bound

$2(fq, h, P ) ≤ 2‖q‖2∞$2(f, h, P ) + 2$2(q, h, P ). (S.212)

69

For ΩZ(P ) the support of Z under P , the mean value theorem then implies that

$2(q, h, P ) ≡ sup‖s‖2≤h

EP [(q(Z + s)− q(Z))21Z + s ∈ ΩZ(P )]

. sup‖s‖2≤h

(‖q‖1,∞s)2 = ‖q‖21,∞h2. (S.213)

Furthermore, for any (sy, sd) ∈ R2 and d ∈ [0, 1] such that d + sd ∈ [0, 1], we also note

that the mean value theorem and Assumption S.6.12(iii) imply that

EP [(1Y + sy ≤ θ(D + sd) − 1Y ≤ θ(D))2|D = d]

= |P (Y ≤ θ(D + sd)− sy|D = d)− P (Y ≤ θ(D)|D = d)|

. |θ(d+ sd)− sy − θ(d)|. (S.214)

Hence, by the law of iterated expectations, a second application of the mean value

theorem, and employing that ‖θ‖1,∞ ≤ C0 by definition of Θ, we can conclude

$2(f, h, P ) ≤ sup‖(sy ,sd)‖2≤h

EP [(1Y +sy ≤ θ(D+sd)−1Y ≤ θ(D))21D+sd ∈ [0, 1]]

. sup‖(sy ,sd)‖2≤h

EP [|θ(D + sd)− sy − θ(D)|1D + sd ∈ [0, 1]] . h. (S.215)

The claim of the Lemma then follows from (S.212), (S.213), and (S.215).

Lemma S.6.18. Define the class Fn ≡ f : f(v) = 1y ≤ θ(d) − τ for some θ ∈ Θnfor Θn as in (S.196), and suppose that Assumption S.6.12(iii) and S.6.13(ii) hold. For

ζjn ≡ supd∈[0,1] ‖pjnn (d)‖2, it then follows that for all ε ≤ 1 and some K <∞

supP∈P

N[ ](ε,Fn, ‖ · ‖P,2) ≤ expKε ∧ (

K√ζjnε

)2jn ,

and supP∈P J[ ](ε,Fn, ‖ · ‖P,2) .√

1 ∧ ε ∧√jn(log(ζjn) + log(1 ∨ ε−1))(1 ∧ ε).

Proof: We first note that if θL(d) ≤ θ(d) ≤ θU (d), then it immediately follows that

1y ≤ θL(d) − τ ≤ 1y ≤ θ(d) − τ ≤ 1y ≤ θU (d), (S.216)

which implies brackets for Θn readily yield brackets in Fn. Moreover, by the mean value

theorem and Assumption S.6.12(iii) we can in addition conclude that

EP [(1Y ≤ θL(D) − 1Y ≤ θU (D))2]

= EP [P (Y ≤ θU (D)|D)− P (Y ≤ θL(D)|D)] . EP [|θU (D)− θL(D)|]. (S.217)

70

Hence, combining results (S.216) and (S.217) it follows that for some M0 <∞ we have

N[ ](ε,Fn, ‖ · ‖P,2) ≤ N[ ](ε2

M0,Θn, ‖ · ‖P,1). (S.218)

On the other hand, since Θn ⊆ Θ, we also obtain by Corollary 2.7.2 in van der Vaart

and Wellner (1996), ‖ · ‖P,2 ≤ ‖ · ‖∞ and inequality (S.218) that

supP∈P

N[ ](ε,Fn, ‖ · ‖P,2) ≤ N[ ](ε2

M0,Θn, ‖ · ‖∞) ≤ expM1

ε. (S.219)

In addition, the Cauchy-Schwarz inequality implies supP∈P ‖pjn′n (β1−β2)‖P,1 ≤ ζjn‖β1−

β2‖2 for any β1, β2 ∈ Rjn . Therefore, defining Bn ≡ β ∈ Rjn : pjn′n β ∈ Θn Theorem

2.7.11 in van der Vaart and Wellner (1996) allows us to conclude

supP∈P

N[ ](ε,Fn, ‖ · ‖P,1) ≤ N(ε2

2M0ζjn,Bn, ‖ · ‖2). (S.220)

Further note that by Assumption S.6.12(iii), we have ‖pjn′n β‖P,2 ‖β‖2 uniformly in

P ∈ P and n, and hence since supP∈P ‖pjn′n β‖P,2 ≤ ‖pjn′n β‖∞ ≤ C0, it follows

supP∈P

N[ ](ε,Fn, ‖ · ‖P,1) ≤ (M2

√ζjn

ε)2jn (S.221)

for some M2 <∞ due to result (S.220). The first claim of the Lemma therefore follows

from (S.219) and (S.221). Moreover, noting N[ ](1,Fn, ‖ · ‖∞) = 1 we also obtain

supP∈P

J[ ](ε,Fn, ‖ · ‖P,2) ≤ (

∫ 1∧ε

0(1 +

K

u)1/2du) ∧ (

∫ 1∧ε

0(1 + 2jn log(

K√ζjnu

))1/2)du

.√

1 ∧ ε ∧ √jn log(ζjn) +

∫ 1

0(jn log(

1

v(1 ∧ ε)))1/2dv(1 ∧ ε), (S.222)

where the second inequality follows by the change of variables v = u/(1∧ ε). The claim

of the Lemma thus follows from (S.222) and direct calculation.

Lemma S.6.19. Let Assumptions S.6.12(i)(iii)(iv) and S.6.13(ii) hold, Θn be as in

(S.196), set π0n ≡ max1≤k≤kn ‖qk‖∞ and π1n ≡ max1≤k≤kn ‖qk‖1,∞, and suppose log(kn∨π0n ∨ supd ‖p

jnn (d)‖2) = O(log(n)). If jn/n = o(1), then Assumption 3.7 holds with

R = Θ for any an with (π0nk1/pn log(n)

√jn/√n)(√jn + π0nn

1/3 + π1nn1/6) = o(an).

Proof: We establish the lemma by applying Theorem S.8.1. To this end, define the

class Fn ≡ fqk for some f ∈ Fn, 1 ≤ k ≤ kn, and obtain the upper bound

supθ∈Θn

‖Gn,Pρ(·, θ) ∗ qknn −Wn,Pρ(·, θ) ∗ qknn ‖p ≤ supf∈Fn

k1/pn |Gn,P f −Wn,P f |. (S.223)

71

We proceed by applying Theorem S.8.1 to the class Fn with δn √jn/n. Note that

Assumptions S.8.1 and S.8.2 are directly imposed in Assumption S.6.12(iv), while As-

sumption S.8.3(i) is satisfied by Lemma S.6.17, and Assumption S.8.3(ii) holds with

Kn = π0n since F has envelope 1. Furthermore, for Sn as in (S.310), we have

S2n .

dlog2 ne∑i=0

2iπ20n

2i3

+π2

1n

22i3

. π20nn

2/3 + π21nn

1/3

due to Lemma S.6.17. Also note that Lemmas S.3.3 and S.6.18 together imply

supP∈P

log(N[ ](δn, Fn, ‖ · ‖P,2)) ≤ supP∈P

log(knN[ ](δnπ0n

,Fn, ‖ · ‖P,2)) . jn log(n),

where we employed that log(kn ∨ π0n ∨ supd ‖pjnn (d)‖2) = O(log(n)) and δn =

√jn/n.

Similarly, Lemmas S.3.3 and S.6.18 and the change of variables v = u/π0n yield

supP∈P

J[ ](δn, Fn, ‖ · ‖P,2) ≤ supP∈P

∫ δn

0(1 + log(kn) + log(N[ ](

u

π0n,Fn, ‖ · ‖P,2)))1/2du

≤ δn√

log(kn) + supP∈P

J[ ](δnπ0n

,Fn, ‖ · ‖P,2)π0n .jn√

log(n)√n

,

where in the final inequality we employed that log(π0n) = O(log(n)), log(kn) = O(log(n)),

and log(supd ‖pjnn (d)‖2) = O(log(n)). Hence, Theorem S.8.1 implies

‖Gn,P −Wn,P ‖Fn = OP (π0n log(n)

√jn√

n√jn + π0nn

1/3 + π1nn1/6)

uniformly in P ∈ P, which together with (S.223) establishes the Lemma.

Lemma S.6.20. If Assumption S.6.12(iii) holds, then it follows that Assumptions 3.8

and 3.9 are satisfied when R = Θ (Θ as in (S.195)) with ‖ · ‖E = supP∈P0‖ · ‖P,2,

‖ · ‖L = ‖ · ‖∞, κρ = 1/2, mP (θ)(Z) ≡ P (Y ≤ θ(D)|Z), and

∇mP (θ)[h](Z) ≡ EP [fY |DZ,P (θ(D)|D,Z)h(D)|Z]. (S.224)

Proof: For θ1∨θ2 and θ1∧θ2 the pointwise minimum and maximum of θ1 and θ2, note

that the conditional density fY |DZ,P being bounded in (D,Z) and P ∈ P by Assumption

S.6.12(iii) together with the mean value theorem imply

EP [(ρ(X, θ1)−ρ(X, θ2))2] = EP [P (Y ≤ θ1(D)∨θ2(D)|D)−P (Y ≤ θ1(D)∧θ2(D)|D)]

. EP [θ1(D) ∨ θ2(D)− θ1(D) ∧ θ2(D)] ≤ supP∈P‖θ1 − θ2‖P,2,

where in the final inequality we employed Jensen’s inequality and that θ1(d) ∨ θ2(d) −θ1(d) ∧ θ2(d) = |θ1(d) − θ2(d)|. It thus follows Assumption 3.8 holds with ‖ · ‖E =

72

supP∈P0‖·‖P,2 and κρ = 1/2. Moreover, Jensen’s inequality and the mean value theorem

imply for some θ such that θ(d) is a convex combination of θ1(d) and θ0(d) that

EP [(P (Y ≤ θ1(D)|Z)− P (Y ≤ θ2(D)|Z)−∇mP (θ0)[θ1 − θ2](Z))2]

≤ EP [(fY |DZ,P (θ(D)|D,Z)− fY |DZ,P (θ0(D)|D,Z)θ1(D)− θ0(D))2]

. ‖θ1 − θ0‖2∞ × supP∈P

EP [(θ1(D)− θ0(D))2],

where the final inequality follows from fY |DZ,P being Lipschitz uniformly in (D,Z) and

P ∈ P. Hence, we may conclude Assumption 3.9(i) is satisfied with ‖ · ‖L = ‖ · ‖∞ and

‖ · ‖E = supP∈P0‖ · ‖P,2. Furthermore, once again employing Jensen’s inequality and

that fY |DZ,P is Lipschitz uniformly in (D,Z) and P ∈ P yields

EP [(EP [fY |DZ,P (θ1(D)|D,Z)− fY |DZ,P (θ0(D)|D,Z)h(D)|Z])2]

. ‖θ1 − θ0‖2∞ × supP∈P‖h‖2P,2 (S.225)

which implies Assumption 3.9(ii) is also satisfied under the stated choices of ‖ · ‖L and

‖ · ‖E. Finally, we note Assumption 3.9(iii) is immediate due to Jensen’s inequality and

fY |DZ,P being bounded uniformly in (D,Z) and P ∈ P.

Lemma S.6.21. If Assumption S.6.17(i) holds, B = C2B([0, 1]) and ΥG, ΥF , and Θ are

as defined in (S.193) (with λ 6= 0), (S.194), and (S.195), then it follows that Assumptions

3.11, 3.12, and 3.13 are satisfied with Kg = 0, ∇ΥG(θ)[h] = −∇2h, and

∇ΥF (θ)[h] = 2

∫ 1

0θ(u)h(u)du− 2(

∫ 1

0θ(u)du)(

∫ 1

0h(u)du). (S.226)

Proof: Note that since ΥG is linear and continuous, it immediately follows that As-

sumptions 3.11(i) and 3.11(ii) hold with ∇ΥG = ΥG and Kg = 0. It further follows

from ∇ΥG = ΥG and the definitions of the operator norm ‖ · ‖o and ‖ · ‖m,∞ that

‖∇ΥG(θ)‖o = sup‖h‖2,∞=1

‖ − ∇2h‖∞ ≤ 1, (S.227)

which implies Assumption 3.11(iii) holds with Mg = 1. Moreover, by direct calculation

|ΥF (θ1)−ΥF (θ2)−∇ΥF (θ1)[θ1 − θ2]|

= |∫ 1

0(θ1(u)− θ2(u))2du− (

∫ 1

0(θ1(u)− θ2(u))du)2| ≤ ‖θ1 − θ2‖22,∞, (S.228)

which implies ΥF is indeed Frechet differentiable and its derivative is equal to ∇ΥF as

73

defined in (S.226). In addition, by (S.226) and Jensen’s inequality we have

‖∇ΥF (θ1)−∇ΥF (θ2)‖o

= sup‖h‖2,∞=1

2|∫ 1

0(θ1(u)− θ2(u))(h(u)−

∫ 1

0h(u)du)du| ≤ 2‖θ1 − θ2‖2,∞, (S.229)

which together with (S.228) implies Assumptions 3.12(i) and 3.12(ii) hold with Kf = 2.

Next, note that since λ 6= 0 it follows that Fn = R. For any θ ∈ Bn such that ΥF (θ) 6= 0,

we then define ∇ΥF (θ)− : Fn → Bn to be given (for any c ∈ R) by

∇ΥF (θ)−[c](d) ≡ c×θ(d)−

∫ 10 θ(u)du

2ΥF (θ), (S.230)

and note that since θ ∈ Bn and the constant function is in Bn by Assumption S.6.17(i),

it follows that ∇ΥF (θ)−[c] ∈ Bn. Moreover, by direct calculation we obtain

∇ΥF (θ)∇ΥF (θ)−[c] = 2

∫ 1

0θ(u)c×

θ(u)−∫ 1

0 θ(u)du

2ΥF (θ)du = c× 2ΥF (θ)

2ΥF (θ)= c, (S.231)

which verifies ∇ΥF (θ)− is indeed the right inverse of ∇ΥF (θ). In addition note that

‖∇ΥF (θ)−‖o = sup|c|=1‖c×

θ −∫ 1

0 θ(u)du

2ΥF (θ)‖2,∞ ≤

‖θ‖2,∞ΥF (θ)

, (S.232)

and hence, since ‖θ‖2,∞ ≤ C0 and ΥF (θ) = λ for any θ ∈ Θr0n(P ), it follows that we

may select an ε > 0 such that Assumption 3.12(iv) holds with Mf = 4C0/λ.

Next, let θ be the function d 7→ d2 and note that by Assumption S.6.17(i) it follows

that θ ∈ Bn. For any θ0 ∈ Θr0n(P ) we may then set h0 to equal

h0 ≡2λ

C0θ − ∇ΥF (θ0)[θ]

C0θ0, (S.233)

which belongs to Bn since θ, θ0 ∈ Bn. Further observe ∇ΥF (θ0)[θ0] = 2ΥF (θ0) = 2λ

due to θ0 ∈ R, and hence by linearity of ∇ΥF (θ0) and (S.233) we can conclude that

h0 ∈ Bn ∩N (∇ΥF (θ0)). In addition, it also follows from ΥG = ∇ΥG that

ΥG(θ0)(u) +∇ΥG(θ0)[h0](u) = −∇2θ0(u)(1− ∇ΥF (θ0)[θ]

C0)− 2λ

C0≤ −2λ

C0, (S.234)

where the inequality results from −∇2θ0(u) ≤ 0 due to θ0 ∈ Θr0n(P ), θ(d) = d 7→ d2,

and |∇ΥF (θ0)[θ]| ≤ C0 by the Cauchy-Schwarz inequality and ‖θ0‖2,∞ ≤ C0 since

θ0 ∈ Θr0n(P ). By similar arguments and the triangle inequality we also have ‖h0‖2,∞ ≤

4λ/C0 + C0 and hence by (S.234) we conclude Assumption 3.13 is satisfied.

Lemma S.6.22. Suppose Assumptions S.6.12(i)(iii)(iv) and S.6.13(ii) hold, Θn be as

74

in (S.196), and let π0n ≡ max1≤k≤kn ‖qk‖∞ and π1n ≡ max1≤k≤kn ‖qk‖1,∞. For any

sequence dn ↑ ∞ such that d2n log(1 + dn) = o(n) and δn d

−1/6n + π1n/(π0nd

1/3n )

satisfies δn log(1 + kn) = o(1) it follows that uniformly in P ∈ P we have

supθ∈Θn

‖Wnρ(·, θ)qknn −W?n,Pρ(·, θ)qknn ‖p

= OP (k

1/pn dnπ0n

√dn log(1 + dn)√n

+ π0nk1/pn (

√δn +

√n exp−nδ3

n/2)).

Proof: We first define the class Fn ≡ fqk for some f ∈ Fn, 1 ≤ k ≤ kn and note

supθ∈Θn

‖Wnρ(·, θ)qknn −W?n,Pρ(·, θ)qknn ‖p ≤ sup

f∈Fnk1/pn |Wnf −W?

n,P f |. (S.235)

In what follows, we aim to establish the lemma by applying Theorem S.9.1 to the class

Fn by relying on a Haar basis expansion as in Lemmas S.8.1 and S.8.2. Specifically,

note that by Assumption S.6.12(iv) and Lemma S.8.1, there exists a sequence of par-

titions ∆n(P ) = ∆d,n(P ) : d = 1, . . . , dn of the support of V ≡ (Y,D,Z) such that

P (∆d,n(P )) 1/dn. For any 1 ≤ d ≤ dn we then set fd,n,P dnd=1 to be given by

fd,n,P (V ) ≡1V ∈ ∆d,n(P )√

P (∆d,n(P ))(S.236)

and let fdnn,P (v) ≡ (f1,n,P (v), . . . , fdn,n,P (v))′. Then note EP [fdnn,P (V )fdnn,P (V )′] = Idn for

Idn the dn× dn identity matrix. Thus, Assumption S.9.1(i) holds with Cn = 1, while by

(S.236) and P (∆d,n(P )) 1/dn, Assumption S.9.1(ii) is satisfied with Kn =√dn.

We next aim to verify that Assumption S.9.2 is satisfied by setting βn,P (f) to be

βn,P (f) ≡

EP [f(V )|V ∈ ∆1,n(P )]

√P (∆1,n(P ))

...

EP [f(V )|V ∈ ∆dn,n(P )]√P (∆dn,n(P ))

(S.237)

for any f ∈ Fn. Then observe that by direct calculation, for any f ∈ Fn we have that

fdnn,P (V )′βn,P (f) =

dn∑d=1

EP [f(V )|V ∈ ∆d,n(P )]1V ∈ ∆d,n(P ). (S.238)

Defining the class of residual functions Gn,P ≡ f − fdn′n,Pβn,P (f) : f ∈ Fn, then observe

that since the functions f ∈ Fn have envelope 1, it follows the class Fn has envelope π0n

and hence by (S.238) and Jensen’s inequality, the class Gn,P has envelope Gn,P ≡ 2π0n.

75

Moreover, by Lemmas S.8.2 and S.6.17 we can in addition conclude

‖f − fdn′n,Pβn,P (f)‖2P,2 .π2

0n

d1/3n

+π2

1n

d2/3n

,

and hence it follows that ‖g‖P,2 ≤ δn‖Gn,P ‖P,2 for all g ∈ Gn,P and δn satisfying

δn 1

d1/6n

+π1n

π0nd1/3n

.

Next, note that if f(V ) ≤ f(V ) ≤ f(V ) almost surely, then result (S.238) yields

f(V )− fdnn (V )′βn,P (f) ≤ f(V )− fdnn (V )′βn,P (f) ≤ f(V )− fdnn (V )′βn,P (f), (S.239)

which implies brackets for Fn can readily be employed to obtain brackets for Gn,P . We

further note that formula (S.238), the triangle inequality, and Jensen’s inequality yield

‖(f − fdn′n βn,P (f))− (f − fdn′n βn,P (f))‖P,2= ‖(f − f) + fdn′n βn,P (f − f)‖P,2 ≤ 2‖f − f‖P,2. (S.240)

Hence, results (S.239), (S.240), Lemma S.3.3, and Fn having envelope π0n establish that

supP∈P

log(N[ ](ε,Gn,P , ‖ · ‖P,2)) ≤ supP∈P

log(N[ ](ε

2, Fn, ‖ · ‖P,2))

≤ log(kn) + supP∈P

log(N[ ](ε

2π0n,Fn, ‖ · ‖P,2)) . log(kn) +

π0n

ε1ε ≤ 2π0n (S.241)

where the final inequality follows for any ε ≤ 2π0n by Lemma S.6.18, and for any ε > 2π0n

by observing that Fn is contained in the brackets [−τ, 1 − τ ] which has width 1, and

hence N[ ](1,Fn, ‖ · ‖P,2) = 1. Recalling that Gn,P has envelope Gn,P ≡ 2π0n, we can

then obtain from result (S.241) the upper bound

supP∈P

J[ ](δn‖Gn,P ‖P,2,Gn,P , ‖ · ‖P,2) .∫ 2δnπ0n

0

√1 + log(kn) +

π0n

ε1ε ≤ 2π0ndε

. δnπ0n

√log(1 + kn) +

∫ 2δnπ0n

0

√π0n

εdε . π0n

√δn, (S.242)

where in the final inequality we employed that δn log(1 + kn) = o(1) by hypothesis.

Similarly, for ηn,P ≡ 1 + logN[ ](δn‖Gn,P ‖P,2,Gn,P , ‖ · ‖P,2) we can conclude that

√nEP [Gn,P (V ) exp−

nδ2n‖Gn,P ‖2P,2

G2n,P (V )ηn,P

] .√nπ0n exp− nδ2

n

1 + log(kn) + 12δn

1δn ≤ 1

≤√nπ0n exp−nδ

3n

2 (S.243)

76

where the second inequality holds for n sufficiently large due to δn log(1 + kn) = o(1).

Together, results (S.242) and (S.243) verify Assumption S.9.2(i) is satisfied with J1n π0n(√δn +

√n exp−nδ3

n/2). Finally, let Bn ≡ βn,P (f) : f ∈ Fn, P ∈ P and note

that definition (S.237), P (∆i,n(P )) 1/dn, Jensen’s inequality, and ‖f‖∞ ≤ π0n imply

‖βn,P (f)‖2 ≤ π0n for all f ∈ Fn and P ∈ P. It thus follows that Bn is contained in a

ball of radius π0n, which together with the change of variables u = ε/(π0n) yields

∫ ∞0

√N(ε,Bn, ‖ · ‖2)dε .

∫ π0n

0

√dn log(

π0n

ε)dε

=√dnπ0n

∫ 1

0

√log(

1

u)du = O(

√dnπ0n). (S.244)

Result (S.244) hence verifies Assumption S.9.2(ii) is satisfied with J2n √dnπ0n. In

summary, we have shown the conditions of Theorem S.9.1(ii) hold with Cn = 1, Kn =√dn, J1n π0n(

√δn +

√n exp−nδ3

n/2), and J2n √dnπ0n. Theorem S.9.1(ii) thus

allows us to conclude that uniformly in P ∈ P we have

supf∈Fn

|Wnf −W?n,P f | = OP (

dnπ0n

√dn log(1 + dn)√

n+ π0n(

√δn +

√n exp−nδ3

n/2)),

which together with (S.235) establishes the claim of the Lemma.

S.6.4 Heterogeneity and Average Welfare

Below we include the proofs of Theorem 4.1, Lemma 4.1, and Theorem 4.2 stated in

Section 4 of the main text as well as needed auxiliary results.

Proof of Theorem 4.1: We establish the claim of the theorem by verifying the

conditions of Theorem 3.2(ii) for both R corresponding to the constraints in (32) and

(33)-(35) (to couple In(R)) and R = Θ (to couple In(Θ)). To this end, note that As-

sumption 3.2 is imposed in Assumption 4.1(i), Assumption 3.3(i) holds with Bn √kn

by Assumption 4.2(i), Assumption 3.3(ii) is directly imposed in Assumption 4.2(ii),

and Assumption 3.3(iii) is satisfied with Jn √jn log(1 + jn) by Lemma S.6.24 and

Fn being ‖ · ‖∞-bounded uniformly in n. Further note Assumption 3.4 is implied by

Assumption 4.3, while Assumption 3.6 is verified by Lemma S.6.23. The coupling re-

quirement of Assumption 3.7 is satisfied for R = Θ, and hence also for R corresponding

to (32) and (33)-(35), with an = (log(n))−1/2 by Lemma S.6.26, Assumption 4.2(iv),

and kn ≥ jn by Assumption 4.2(iii). Moreover, Assumptions 3.8 and 3.9 also hold

by Lemma S.6.25. It thus only remains to verify Assumption 3.10, and to this end

we note Assumption 3.10(ii) is implied by Assumption 4.1(iv). Furthermore, as ar-

gued, Bn √kn, Jn

√jn log(1 + jn), and νn 1/

√jn by Lemma S.6.23, which

allows us to conclude that Assumption 3.10(iii) follows from Assumption 4.3(i). By

77

Lemma S.6.24 and Rn knjn√

log(1 + kn) log(1 + jn)/n, we then note that Assump-

tion 3.10(i) is implied by Assumption 4.2(iv) and κρ = 1, while the requirements

k1/pn


κρn ,Fn, ‖ · ‖P,2) = o(an) and Rn = o(`n) are implied

by kn√jn log2(n)`n = o(1) and knjn log(n)/

√n = o(`n). Since Km = 0 in this ap-

plication, it follows all the conditions of Theorem 3.2(ii) hold for both R = Θ and R

corresponding to (32) and (33)-(35), and hence the theorem follows.

Proof of Lemma 4.1: The result essentially follows from Theorem 1 in Walkup and

Wets (1969). To map our problem into their setting, note that since µs,nsns=1 are

orthogonal, every ν ∈ Mn(B) can be identified with a unique π ∈ Rsn through the

relation ν =∑sn

s=1 πsµs,n – e.g., by πs = ν(Ss,n) for Ss,n the support of µs,n. With some

abuse of notation, for the remaining of the proof we therefore employ π in place of ν.

Further note that, for any θ = (F (c|·)J=1, π), the restrictions ΥG(θ) ≤ 0, Υ(ν)F (θ) = 0,

and Υ(s)F (θ) = 0 depend only on π and define a closed convex polyhedron on Rsn , which

we denote by Kn. Next, define the map Λn : Rsn → RJL by

Λn(π) = sn∑s=1

πs(

∫1g(wl, η) ≤ cµs,n(dη))1≤≤J ,1≤l≤L (S.245)

and note that for any θ = (F (c|·)J=1, π) = θ ∈ Θn ∩R, it follows by (40) that

Γn(θ) = Kn ∩ Λ−1n (F (c|wl)1≤≤J ,1≤l≤L). (S.246)

Also observe that since sn > JL by hypothesis, the null space of Λn must have dimension

at least one. On the other hand, if the null space of Λn has dimension sn, then Γn(θ1) =

Γn(θ2) for any θ1, θ2 ∈ Θn ∩R by result (S.246), and hence the conclusion of the lemma

is immediate. We therefore assume without loss of generality that the null space of Λn

has dimension smaller than sn, which allows us to apply Theorem 1 in Walkup and Wets

(1969) to conclude there is a Cn such that for any θ1, θ2 ∈ Θn ∩R we have

dH(Γn(θ1),Γn(θ2), ‖ · ‖2) ≤ CnJ∑=1

L∑l=1

(F1(c|wl)− F2(c|wl))21/2

. Cn

J∑=1

‖F1(c|·)− F2(c|·)‖∞, (S.247)

and where the norm ‖ · ‖2 on Γn(θ) is understood as the usual Euclidean norm on

the corresponding π ∈ Rsn . On other hand, also note that for any two measures

ν =∑sn

s=1 πs,nµs,n and ν =∑sn

s=1 πs,nµs,n we have ‖ν − ν‖TV = ‖π − π‖1 due to the

measures µs,nsns=1 being orthogonal. Hence, since ‖a‖1 ≤√sn‖a‖2 for any a ∈ Rsn by

78

the Cauchy-Schwarz inequality, result (S.247) allows us to conclude

dH(Γn(θ1),Γn(θ2), ‖ · ‖TV ) .√snCn

J∑=1

‖F1(c|·)− F2(c|·)‖∞,

which establishes the claim of the lemma by setting ζn = Cn√sn.

Proof of Theorem 4.2: For any ` ≥ 0 let Vn(θ, `) ≡ Vn(θ,+∞) ∩ h/√n ∈ Bn :

‖h/√n‖E ≤ ` and recall ‖θ‖E = max1≤≤J ‖F (c|·)‖∞ for any θ = (F (c|·)J=1, ν) ∈

B. Next define the random variables En(R|`n) and En(Θ|ùn) to be given by

En(R|`n) = infθ∈Θr

n


J∑=1

‖Wnρ(·, θ)qknn + D,n[h]‖Σ,n,21/2

En(Θ|ùn) = inf‖ h√

n‖E≤ùn

J∑=1


1/2,

and note that for any sequence `n, ùn satisfying the conditions of the theorem, Assump-

tion 4.4(iii) and Lemma S.6.31 together imply that Un(R| +∞) = En(R|`n) + oP (an)

and Un(Θ| +∞) = En(Θ|ùn) + oP (an) uniformly in P ∈ P0. Hence, to establish the

theorem it suffices to show there are `n ñ and ùn ˜u

n such that

En(R|`n) ≥ U?n,P (R|ñ) + oP (an)

En(R|`n)− En(Θ|ùn) ≥ U?n,P (R|ñ)− U?n,P (Θ|˜un) + oP (an) (S.248)

uniformly in P ∈ P0. To this end, we rely on Theorem S.5.1(ii) (for En(R|`n)) and

Lemma S.5.5 (for En(Θ|ùn)) Also note that in the proof of Theorem 4.1 we showed

Assumptions 4.1, 4.2, and 4.3 imply Assumptions 3.1-3.10 hold with Bn √kn, Jn √

jn log(1 + jn), νn √jn, Rn knjn

√log(1 + kn)(1 + jn)/n, an = (log(n))−1/2, κρ =

1, ‖θ‖L = ‖θ‖E = max1≤≤J ‖F (c|·)‖∞, and ‖θ‖B =∑J

=1 ‖F (c|·)‖∞ + ‖ν‖TV for

R = Θ and R corresponding to the constraints in (32) and (33)-(35).

In order to apply Theorem S.5.1(ii), we note Assumption 4.4(i) and Lemma S.6.27

verify Assumptions 3.11, 3.12, and 3.13 are satisfied with Kg = 0, Mg = 1, and Kf = 0,

while Assumption 4.4(iii) and Lemma S.6.30 verify Assumption 3.14 and Assumption

3.15(i) is immediate given the definitions of ‖ · ‖E and ‖ · ‖B. Also note Assumption

3.15(ii) follows from Theorem 3.1, νn √jn, and τn

√jn = o(1) by Assumption 4.4(v)

implying that−→d H(Θr

n,Θr0n(P ), ‖ · ‖E) = oP (1) uniformly in P ∈ P0 and therefore

lim infn→∞

infP∈P

P (θ ∈ Bn :−→d H(θ, Θr

n, ‖ · ‖E) ≤ ε ⊆ Θn)

≥ lim infn→∞

infP∈P

P (θ ∈ Bn :−→d H(θ,Θr

0n(P ), ‖ · ‖E) ≤ 2ε ⊆ Θn) = 1, (S.249)

where the final equality holds for any ε < 1/2 by Assumption 4.4(iv), the definition of

79

Θn, and ‖FP (c|·)‖∞ ≤ 1. Next, observe Lemma 4.1 and the definitions of ‖ · ‖E, ‖ · ‖L,

and ‖ · ‖B imply Assumption S.5.1 holds with Dn(B,E) ζn and Dn(L,E) = 1. Since

Km = Kg = Kf = 0 and ΥF and ΥG are affine, the only requirements imposed by

Assumption S.5.2 are that k1/pn


κρn ∨ (νnτn)κρ ,Fn, ‖ · ‖P,2) =

o(an) and (Rn + νnτn)Dn(B,E) = o(rn), which are implied by Assumption 4.4(v),

Lemma S.6.24, and kn√jn log2(n)`n = o(1) by hypothesis. Hence, all the conditions of

Theorem S.5.1(ii) hold, which implies there is a ñ `n such that uniformly in P ∈ P0

En(R|`n) ≥ U?n,P (R|ñ) + oP (an). (S.250)

Finally, to apply Lemma S.5.5, note that in coupling En(Θ|ùn) we can set ‖ · ‖B =

‖ · ‖E = max1≤≤J ‖F (c|·)‖∞ and ΥG(θ) = ΥF (θ) = 0 for all θ ∈ B (since R = Θ).

Hence, Assumptions 3.11, 3.12, and 3.13, 3.15(i) are immediate, while Assumption

3.14 is satisfied by Assumption 4.4(iii) and Lemma S.6.30. Further note since Θ0(P )

is an equivalence class under ‖ · ‖E, when studying the unconstrained statistic we

can treat the model as identified. As a result, we may set τun = 0 and Assump-

tion 3.15(ii) holds by arguments identical to (S.249). In order to apply Lemma S.5.5,

it therefore only remains to verify Assumption 3.16 for the unconstrained problem.

However, in this instance, the only condition imposed by Assumption 3.16 is that

k1/pn


κρn ,Fn, ‖ · ‖P,2) = o(an), which holds by Lemma S.6.24

and kn√jn log2(n)ùn = o(1) by hypothesis. Thus, (S.250) and Lemma S.5.5 verify

(S.248), which in turn establishes the theorem.

Lemma S.6.23. If Assumptions 4.1, 4.2(i)(iii), 4.3 hold, then Assumptions 3.5, 3.6

hold with R = Θ and R corresponding to (32), (33)-(35), ‖θ‖E = max1≤≤J ‖F (c|·)‖∞for any θ = (F (c|·)J=1, ν) ∈ B, Vn(P ) = Θn ∩R, and ν−1

n 1/√jn.

Proof: First note that Assumption 4.1(iv) and ‖Σ,n(P )‖o,2 being uniformly bounded

in P ∈ P and n by Assumption 4.3(ii) allow us to conclude that

supP∈P0

supθ∈Θ0(P )

Qn,P (Πrnθ) = O((n log(n))−

12 ). (S.251)

Furthermore, since the class Fn has envelope 3 due to ‖F (c|·)‖∞ ≤ 2 by definition

of Θ in (30), Lemma S.6.24 allows us to set Jn √jn log(1 + jn). Since in addition

Assumption 4.2(i) implies Bn √kn while Assumption 4.2(iii) implies kn ≥ jn, we

obtain that ηn (as defined in Assumption 3.6) satisfies ηn √jnkn log(1 + kn)/

√n.

Hence, Qn,P (θ) ≥ 0 for all θ ∈ B and (S.251) imply Assumption 3.5 holds.

In order to verify Assumption 3.6(i), let ‖θ‖E = max1≤≤J ‖F (c|·)‖∞ for any θ =

(F (c|·)J=1, ν) ∈ B. Then note any (F (c|·)J=1, ν) = θ ∈ Θn must be such that

F (c|·) = pjn′n β,θ for some β,θ ∈ Rjn and, similarly, Πrnθ0(P ) = (Fn(c|·)J=1, νn) must

80

satisfy Fn(c|·) = pjn′n β,n. The Cauchy Schwarz inequality, and Assumptions 4.1(ii)(iii)

and 4.2(iii) then yield uniformly in P ∈ P that

‖θ−Πrnθ0(P )‖E .

√jn

J∑=1

‖β,θ−β,n‖2 .√jn

J∑=1

‖EP [qknn (W )pjnn (W )′(β,θ−β,n)]‖2

.√jn

J∑=1

‖EP [(F (c|W )− Fn(c|W ))qknn (W )]‖Σ,n(P ),21/2, (S.252)

where the final inequality holds due to ‖Σ,n(P )−1‖o,2 being uniformly bounded by

Assumption 4.3(ii) and∑J

=1 |a()| ≤√J ‖a‖2 for any (a(1), . . . , a(J )) = a ∈ RJ . Hence,

combining (S.251), (S.252), and the law of iterated expectations yields

1√jn‖θ −Πr

nθ0(P )‖E . Qn,P (θ)− supθ∈Θ0(P )

Qn,P (Πrnθ) +O((n log(n))−1/2). (S.253)

Since inequality (S.253) holds for any θ ∈ Θn and, as already shown, ηn √jnkn log(1+

kn)/√n, we conclude that Assumption 3.6(i) is satisfied with ν−1

n = 1/√jn and Vn(P ) =

Θn ∩ R. Finally, we note Assumption 3.6(ii) is implied by Assumption 4.3(i), result

(S.251), and ηn √jnkn log(1 + kn)/

√n.

Lemma S.6.24. Define the class Fn ≡ f : f(v) = (1y ≤ c−pjnn (w)′β) for some 1 ≤ ≤ J and ‖pjn′n β‖∞ ≤ 2 and suppose that Assumptions 4.1(ii)(iii) hold. Then, it

follows that supP∈PN[ ](ε,Fn, ‖ · ‖P,2) . (1 ∨ (√jnK/ε)

jn) for some K < ∞, and in

addition supP∈P J[ ](ε,Fn, ‖ · ‖P,2) . ε√jn(1 +

√log(1 ∨ (

√jn/ε))).

Proof: First note that for any pjn′n β1 and pjn′n β2, the Cauchy-Schwarz inequality yields

|pjnn (w)′β1 − pjnn (w)′β2| ≤ supw‖pjnn (w)‖2‖β1 − β2‖2 .

√jn‖β1 − β2‖2,

where in the final inequality we employed Assumption 4.1(ii). Hence, Theorem 2.7.11

in van der Vaart and Wellner (1996), ‖β‖2 supP∈P ‖pjn′n β‖P,2 by Assumption 4.1(iii),

and supP∈P ‖pjn′n β‖P,2 ≤ ‖pjn′n β‖∞ ≤ 2 for any pjn′n β ∈ Θn imply

supP∈P

N[ ](ε,Fn, ‖ · ‖P,2) . 1 ∨ (K√jnε

)jn , (S.254)

for some K < ∞, which establishes the first claim of the lemma. For the second claim

of the lemma, we employ (S.254) and the change of variables v = u/ε to obtain

supP∈P

J[ ](ε,Fn, ‖ · ‖P,2) . ε+

∫ ε

0(log(1 ∨ (

K√jn

u)jn))1/2du

= ε(1 +√jn

∫ 1

0(log(1 ∨ (

K√jn

vε)))1/2dv) .

√jnε(1 +

√log(1 ∨ (

√jn/ε))),

81

where the final inequality follows from (1 ∨ ab) ≤ (1 ∨ a)(1 ∨ b) for any a, b ∈ R+.

Lemma S.6.25. Let ρ : R+×W×Θ be as defined in (31). It then follows Assumptions

3.8 and 3.9 hold with κρ = 1, Kρ =√J , Km = 0, Mm < ∞, ‖ · ‖L = ‖ · ‖E, and

‖θ‖E = max1≤≤J ‖F (c|·)‖∞ for any θ = (F (c|·)J=1, ν) ∈ B.

Proof: First note that for any (F1(c|·)J=1, ν1) = θ1 ∈ B and (F2(c|·)J=1, ν2) =

θ2 ∈ B, we obtain from (31) and the definition of ‖ · ‖E that for all P ∈ P

EP [‖ρ(X, θ1)− ρ(X, θ2)‖22] =

J∑=1

EP [(F1(c|W )− F2(c|W ))2] ≤ J‖θ1 − θ2‖2E,

which verifies Assumption 3.8 holds with κρ = 1 and Kρ =√J . Next, for any P ∈ P

define ∇mP,(θ)[h] = −Fh(c|W ) for all θ ∈ B and (Fh(c|·)J=1, νh) = h ∈ B. Since

mP,(θ) = P (Y ≤ c|W )− F (c|W ) for any θ = (F (c|·)J=1, ν) ∈ B, direct calculation

verifies Assumption 3.9 holds with Km = 0, Mm = 1, and ‖ · ‖L = ‖ · ‖E.

Lemma S.6.26. If k3nj

3n log2(n) = o(n), Assumptions 4.1(i)-(iii) and 4.2(i) hold, then

Assumption 3.7 holds with R = Θ for any an with k3nj

3n log2(n)/n = o(an).

Proof: We establish the result by applying Lemma S.6.28. To this end, we let jn =

J +jn set rj,njnj=1 = 1y ≤ cJ=1∪pj,njnj=1 and let rjnn (x) ≡ (r1,n(x), . . . , rjn,n(x))′.

Next note that any f ∈ Fn may be written as rjn′n β for some β ∈ Rjn . Moreover, since

supP∈P max1≤≤J ‖F (c|·)‖P,2 ≤ max1≤≤J ‖F (c|·)‖∞ ≤ 2 for any (F (c|·)J=1, ν) =

θ ∈ Θn, Assumption 4.1(iii) implies that there exists a C0 <∞ (independent of n) such

that ‖β‖2 ≤ C0 whenever rjn′n β ∈ Fn. Hence, by Assumptions 4.1(ii) and 4.2(i), we may

apply Lemma S.6.28 with b1n √kn, b2n

√jn, and Cn = O(1), from which the claim

of the present lemma immediately follows.

Lemma S.6.27. Let B = (⊗J

=1CB(W)) ×M(B) and Θ, ΥG, and ΥF be as defined

in (30), (32), and (33)-(35). If Ψ(g, ·) is bounded on Ω, then Assumptions 3.11, 3.12,

and 3.13 are satisfied with Kg = 0, Mg = 1, ∇ΥG(θ)[h] = ΥG(h), Kf = 0, and

∇ΥF (θ)[h] = (Υ(e)F (h),Υ

(ν)F (h) + 1,Υ

(s)F (h) + λ)′. (S.255)

Proof: For any measure ν ∈M(B) let ν = ν+ − ν− denote its Jordan decomposition,

|ν| = ν+ + ν−, and recall the total variation of ν equals ‖ν‖TV = |ν|(Ω). Since ΥG :

B→ `∞(B) is linear, in order to verify Assumption 3.11 we need only show that ΥG is

continuous. To this end, recall that for any (F (c|·)J=1, ν) = θ ∈ B we had defined

‖θ‖B =∑J

=1 ‖F (c|·)‖∞ + ‖ν‖TV . Hence, employing the definition of ΥG we obtain

‖ΥG‖o = sup‖θ‖B=1

‖ΥG(θ)‖∞ = supν:‖ν‖TV =1

supB∈B|ν(B)| ≤ sup

ν:‖ν‖TV =1|ν|(Ω) = 1,

82

which, by linearity of ΥG, implies Assumption 3.11 holds with ∇ΥG(θ)[h] = ΥG(h),

Kg = 0, and Mg = 1. By similar arguments, note that Υ(e)F : B → RJL, as defined in

(33), is linear and its operator norm can be bounded by

‖Υ(e)F ‖

2o = sup

‖θ‖B=1

J∑=1

L∑l=1

(F (c|wl)−∫

1g(wl, η) ≤ cν(dη))2

≤J∑=1

L∑l=1

2 sup‖F (c|·)‖∞=1

(F (c|wl))2 + 2 sup‖ν‖TV =1

|ν|(Ω) = 4JL. (S.256)

Moreover, note that for any bounded f : Ω→ R and ν1, ν2 ∈M(B) it follows that∫Ωf(η)(ν1(dη)− ν2(dη)) ≤ ‖f‖∞|ν1 − ν2|(Ω) = ‖f‖∞‖ν1 − ν2‖TV ,

which implies Υ(ν)F : B → R and Υ

(s)F : B → R are Frechet differentiable at any

θ ∈ B with ∇Υ(ν)F (θ)[h] = Υ

(ν)F (h) + 1, ∇Υ

(s)F (θ)[h] = Υ

(s)F (h) + λ, ‖∇Υ

(ν)F (θ)‖o ≤

1, and ‖∇Υ(s)F (θ)‖o ≤ ‖Ψ(g, ·)‖∞. By employing (S.256) we may therefore conclude

Assumptions 3.12(i)(ii)(iii) are satisfied with ∇ΥF (θ) as in (S.255), Kf = 0, and M2f =

4JL+ 1 + ‖Ψ(g, ·)‖2∞. Furthermore, note that (provided R 6= ∅) we may find a ν? such

that ν?(Ω) = 1 and∫

Ψ(g, η)ν?(dη) = λ, which together with (S.255) implies the range

of∇ΥF (θ) equals Fn and hence Assumption 3.12(iv) holds. Finally, we note Assumption

3.13 is immediate due to ΥF being affine.

Lemma S.6.28. Let rj,njnj=1 be a set of functions of X, rjnn (x) = (r1,n(x), . . . , rjn,n(x))′,

Gn = rjn′n β for some β with ‖β‖2 ≤ Cn, and suppose b1n ≡ max1≤k≤kn ‖qk‖∞ and

b2n ≡ max1≤j≤jn ‖rj‖∞ are finite for all n. If Xi, Zini=1 is i.i.d. with (X,Z) dis-

tributed according to P ∈ P, then it follows that uniformly in P ∈ P

supg∈Gn

‖Gn,P gqknn −Wn,P gq

knn ‖2 = OP (

Cn(knjnb1nb2n log(n))2

n).

Proof: For notational simplicity, we first define a kn × jn matrix E(1)n,P to be given by

E(1)n,P ≡

1√n

n∑i=1

qknn (Zi)rjnn (Xi)

′ − EP [qknn (Z)rjnn (X)′].

For any matrix A let vecA denote a column vector consisting of the unique elements

of A and set En,P ≡ vecE(1)n,P , noting that En,P has dimension (at most) jnkn. Our

first step is to couple En,P to a normal vector Nn,P . To this end, we note that

‖vecqknn (Z)rjnn (X)′ − EP [qknn (Z)rjnn (X)′]‖2 ≤ 2√jnknb1nb2n

by definition of b1n and b2n. Since the dimension of En,P is at most jnkn, Theorem 1.1

83

in Zhai (2018) and Markov’s inequality imply, provided the underlying probability space

is suitably rich, that there is a Gaussian vector Nn,P such that

‖En,P − Nn,P ‖2 = OP ((knjnb1nb2n log(n))2

n) (S.257)

uniformly in P ∈ P. Next observe that for any g ∈ Gn there exists a β ∈ Rjn such that

Gn,P gqknn = E(1)

n,Pβ.

Hence, letting N(1)n,P denote the kn × jn matrix built from the corresponding entries of

the normal vector Nn,P , we define the Gaussian process Wn,P by setting

Wn,P gqknn = N(1)

n,Pβ

for any rjn′n β = g ∈ Gn. Therefore, since ‖β‖2 ≤ Cn by definition of Gn, and the operator

norm is bounded by the Frobenius norm, we obtain from result (S.257) that

supg∈Gn

‖Gn,P gqknn −Wn,P gq

knn ‖2 ≤ ‖E

(1)n,P − N(1)

n,P ‖o,2Cn = OP (Cn(knjnb1nb2n log(n))2

n)

uniformly in P ∈ P, and hence the claim of the lemma follows.

Lemma S.6.29. Let rj,njnj=1 be a set of functions of X, rjnn (x) ≡ (r1,n(x), . . . , rjn,n(x))′,

and suppose supx ‖rjnn (x)‖2 . b1n, supz ‖qknn (z)‖2 . b2n, and EP [qknn (Z)qknn (Z)′] and

EP [rjnn (X)rjnn (X)′] have eigenvalues bounded uniformly in P ∈ P and n. If Xi, Zini=1

is i.i.d. with (X,Z) ∼ P ∈ P, then there is a K <∞ such that for all δ ≥ 0

supP∈P

P (‖ 1

n

n∑i=1

qknn (Zi)rjnn (Xi)

′ − EP [qknn (Z)rjnn (X)′]‖o,2 > δ)

≤ (jn + kn) exp− nδ2K

b21n ∨ b22n + δb1nb2n. (S.258)

Proof: We first define a kn × jn random matrix Mi,n satisfying EP [Mi,n] = 0 by

Mi,n ≡1

nqknn (Zi)r

jnn (Xi)

′ − EP [qknn (Z)rjnn (Xi)′].

Since for any random matrix A we have ‖E[A]‖o ≤ E[‖A‖o] by Jensen’s inequality,

‖A‖2o ≤ traceA′A, supx ‖rjnn (x)‖2 . b1n, and supz ‖qknn (z)‖2 . b2n imply

‖Mi,n‖2o . ‖1

nqknn (Zi)r

jnn (Xi)

′‖2o + EP [‖ 1

nqknn (Z)rjnn (X)′‖2o]

.supz ‖qknn (z)‖22 × supx ‖r

jnn (x)‖22

n2.b21nb

22n

n2. (S.259)

84

Moreover, since the eigenvalues of EP [qknn (Z)qknn (Z)′] are bounded uniformly in P ∈ P

by Assumption and supx ‖rjnn (x)‖2 . b1n it additionally follows that

supP∈P‖

n∑i=1

EP [Mi,nM′i,n]‖o ≤ supP∈P

2

n‖EP [qknn (Z)qknn (Z)′‖rjnn (X)‖22]‖o .

b22nn. (S.260)

Identical arguments but relying on the eigenvalues of EP [rjnn (X)rjnn (X)′] being bounded

uniformly in P ∈ P and supx ‖qknn (x)‖2 . b1n by hypothesis further yield that

supP∈P‖

n∑i=1

EP [M′i,nMi,n]‖o .b21nn. (S.261)

The claim of the Lemma the follows from results (S.259), (S.260), and (S.261) allowing

us to apply Theorem 1.6 in Tropp (2012) with σ2 (b21n ∨ b22n)/n and R b1nb2n/n.

Lemma S.6.30. If Assumptions 4.1(i)-(iii), 4.2(i)(ii) hold, and j2nk

2n log(1 + jnkn) =

o(n), then it follows that Assumption 3.14 holds with R = Θ for any sequence an satis-

fying k1/pn (k2

nj5n log3(1 + knjn)/n)1/4 = o(an).

Proof: Let Gn ≡ g : g(x) = 1y ≤ c − pjn′n β for some 1 ≤ ≤ J and ‖pjn′n β‖∞ ≤ 2and Fn ≡ gqk,n : g ∈ Gn and 1 ≤ k ≤ kn. Then note that when R = Θ we obtain

supf∈Fn

‖Wnfqknn −W?

nfqknn ‖p ≤ k1/p

n supf∈Fn

|Wnf −W?n,P f |. (S.262)

We will therefore establish the lemma by employing (S.262) and applying Theorem

S.9.1(i) to the class Fn. To this end, let fdnn (v) ≡ qknn (z)⊗(pjnn (w)′, 1y ≤ c1, . . . , 1y ≤cJ )′ and note dn = kn(jn + J ). Next observe that applying Lemma S.6.13 with

D1 ≡ (pjnn (W )′, 1Y ≤ c1, . . . , 1Y ≤ cJ )′ and D2 = qknn (Z) allows us to conclude

supP∈P

EP [fdnn (V )fdnn (V )′] ≤ supP∈P‖eigD1D

′1‖P,∞ × eigEP [D2D

′2] . jn,

where the final inequality holds by Assumptions 4.1(ii) and 4.2(ii). Hence, it follows

Assumption S.9.1(i) holds with Cn = jn, while Assumption S.9.1(ii) is satisfied with

Kn =√knjn by Assumptions 4.1(ii) and 4.2(i). Further note that by Assumptions

4.1(iii) it follows that ‖β‖2 supP∈P ‖pjn′n β‖P,2 ≤ ‖pjn′n β‖∞. Hence, by definition of

Fn, there is a C0 <∞ such that any f ∈ F satisfies fdn′n β for some β in

Bn ≡ β ∈ Rdn : β = ek ⊗ γ for some γ ∈ Rjn+1 with ‖γ‖2 ≤ C0,

where ek ∈ Rkn has its kth coordinate equal to one and all other coordinates equal to

zero. In particular, it follows that Assumption S.9.2(i) is immediate with Gn,P equal to

the zero function and J1n = 0. Moreover, setting Cn ≡ γ ∈ R1+jn : ‖γ‖2 ≤ C0, we

85

can then conclude from the definition of Bn and N(ε, Cn, ‖ · ‖2) . 1 ∨ (C0/ε)jn that∫ ∞

0

√log(N(ε,Bn, ‖ · ‖2))dε

.∫ C0

0

√log(kn) + log(N(ε, Cn, ‖ · ‖2))dε .

√log(kn) +

√jn,

which verifies Assumption S.9.2(ii) is satisfied with J2n √

log(kn) +√jn. Thus,

applying Theorem S.9.1(i) with Kn √kn, jn, Cn jn, dn . knjn, J1n = 0, and

J2n √

log(kn) +√jn implies that uniformly in P ∈ P we have

supf∈Fn

|Wnf −W?n,P f | = OP (k

2nj

5n log3(1 + knjn)

n1/4) (S.263)

provided that j2nk

2n log(1 + jnkn) = o(n). Since the latter condition is satisfied by hy-

pothesis, the claim of the lemma then follows from (S.262) and (S.263).

Lemma S.6.31. Define ‖θ‖E = max1≤≤J ‖F (c|·)‖∞ and let Vn(θ, `n) = Vn(θ,+∞) ∩h/√n : ‖h/

√n‖E ≤ `n for Vn(θ,+∞) as in (39). If Assumptions 4.1, 4.2(i)(iii)(iv),

and 4.3 hold, then for any an and `n satisfying k4nj

5n log3(1 + knjn)/n = o(a4

n) and

knjn log(n)/√n = o(`n) it follows uniformly in P ∈ P that

Un(R|+∞) = infθ∈Θr

n


J∑=1

‖Wnρ(·, θ)qknn + D,n[h]‖Σ,n,21/2 + oP (an)

Un(Θ|+∞) = inf‖ h√

n‖E≤`n

J∑=1


1/2 + oP (an).

Proof: We establish the claim by verifying the conditions of Lemma 3.1 under both R =

Θ and R corresponding to the constraints in (32) and (33)-(35). To this end, recall that

in the proof of Theorem 4.1 we argued (for both specifications of R) that Assumptions

3.3(i)(iii) hold with Bn √kn and Jn

√jn log(1 + jn) and that Assumption 3.4 is

satisfied by Assumption 4.3. Further note Assumption 4.1(ii) and the Cauchy-Schwarz

inequality imply for any (pjn′n β,hJ=1, ν) = h ∈ Bn that

‖h‖E . max1≤≤J

√jn‖β,h‖2 .

√jn

J∑=1

‖D,n,P [h]‖221/2 =√jn‖Dn,P [h]‖2, (S.264)

where the second inequality follows from D,n,P [h] = −EP [qknn (Z)pjnn (W )′β,h] and the

smallest singular values of EP [qknn (Z)pjnn (W )′] being bounded away from zero uniformly

in P ∈ P by Assumption 4.2(iii). Since νn √jn by Lemma S.6.23 and the derivative

Dn,P (θ) does note depend on θ, we conclude ‖h‖E ≤ νn‖Dn,P [h]‖2 for all h ∈ Bn – i.e.,

in the verifying the conditions of Lemma 3.1 we may set An(P ) = Θn ∩ R. In order to

86

verify condition (24) of Lemma 3.1 we note that since ‖h‖E . max1≤≤J√jn‖βh‖2 as

shown in (S.264), the definitions of the operator norm ‖ · ‖o,2, D,n, and D,n,P imply

suph∈Bn

‖Dn[h]− Dn,P [h]‖2‖h‖E

.√jn‖

1

n

n∑i=1

qknn (Zi)pjnn (Wi)

′ − EP [qknn (Z)pjnn (W )′]‖o,2 = oP (1), (S.265)

where the final equality holds uniformly in P ∈ P by applying Lemma S.6.29 with

b1n =√jn, b2n = kn (by Assumptions 4.1(ii) and 4.2(i)) and employing that kn ≥ jn and

k2njn log(kn)/n = o(1) by Assumptions 4.2(iii)(iv). Finally, we note that Assumption

4.4(iii) implies j5nk

4n log5(1 + jnkn) = o(n), and employing Lemma S.6.30 with p = 2

yields that Assumption 3.14 holds for R = Θ, and hence also for R corresponding to

(32) and (33)-(35). The only condition of Lemma 3.1 that remains to be verified is that

Sn(B,E)Rn = o(`n). To this end, we observe that since Vn(θ, `n) is defined through

the constraint ‖h‖E ≤ `n (instead of ‖ · ‖B ≤ `n), it suffices to verify Rn = o(`n) – i.e.

for the purposes of this lemma we may set ‖ · ‖B = ‖ · ‖E. However, since as argued

Jn √jn log(jn), Bn =

√kn, and νn

√jn, we have Rn knjn

√log(kn) log(jn)/

√n,

and the requirement Rn = o(`n) is implied by knjn log(n)/√n = o(`n). Thus, the claim

of the lemma follow from Lemma 3.1.

S.7 Local Parameter Space

This section contains analytical results concerning the local parameter space and our

approximation to it. The main result of this section is Theorem S.7.1, which plays an

instrumental role in the proof of the results of Section 3.3 in the main text.

Theorem S.7.1. Let Assumptions 3.1, 3.11, 3.12, and 3.13 hold, `n, δn, rn∞n=1 satisfy

`n ↓ 0, δn1Kf > 0 ↓ 0, rn ≥ (Mgδn +Kgδ2n) ∨ 2(`n + δn)1Kg > 0, and define

Gn(θ) ≡ h√n∈ Bn : ΥG(θ +

h√n

) ≤ (ΥG(θ)−Kgrn‖h√n‖B1G) ∨ (−rn1G)

An(θ) ≡ h√n∈ Bn :


h√n

) = 0 and ‖ h√n‖B ≤ `n

Tn(θ) ≡ h√n∈ Bn : ΥF (θ +

h√n

) = 0, ΥG(θ +h√n

) ≤ 0 and ‖ h√n‖B ≤ 2`n.

(i) Then, there exist M < ∞, ε > 0, and n0 < ∞ such that for all n > n0, P ∈ P,

θ0 ∈ Θr0n(P ), and θ1 ∈ (Θr

0n(P ))ε ∩R satisfying ‖θ0 − θ1‖B ≤ δn we have

suph1√n∈An(θ1)

infh0√n∈Tn(θ0)

‖ h1√n− h0√

n‖B ≤M × `n(`n + δn)1Kf > 0. (S.266)

87

(ii) If in addition ΥG and ΥF are affine, then for any θ0, θ1 ∈ Bn∩R with ‖θ0−θ1‖B ≤ δn

h√n∈ Bn :

h√n∈ Gn(θ1), ΥF (θ1 +

h√n

) = 0

⊆ h√n∈ Bn : ΥG(θ0 +

h√n

) ≤ 0, ΥF (θ0 +h√n

) = 0.

Proof: We being by establishing part (ii) due to its simplicity. In particular, note that

if ΥG is affine, then Kg = 0 and by Lemma S.7.1(ii) we obtain that

Gn(θ1) ⊆ h√n∈ Bn : ΥG(θ0 +

h√n

) ≤ 0 (S.267)

for any θ0, θ1 ∈ Bn with ‖θ0 − θ1‖B ≤ δn due to rn ≥ Mgδn. Moreover, if ΥF is affine

and continuous, then ΥF (θ) = L(θ) + c0 for some continuous linear map L : B → F

and c0 ∈ F. It follows that ∇ΥF (θ)[h] = L(h), which does not depend on θ, and

since any θ ∈ R must satisfy L(θ) = −c0 (since ΥF (θ) = 0), we can conclude that

h : ΥF (θ + h) = 0 = h : ∇ΥF (h) = 0 whenever θ ∈ R. Therefore the claim of

Theorem S.7.1(ii) follows from result (S.267) and θ1, θ2 ∈ R.

We next turn to the proof of part (i). Throughout, let ε be such that Assumptions

3.11 and 3.12 hold and set ε = ε/2. We break up the proof into four steps.

Step 1: (Decompose h/√n). For any P ∈ P, θ0 ∈ Θr

0n(P ), and h ∈ Bn set

h⊥θ0 ≡ ∇ΥF (θ0)−∇ΥF (θ0)[h] hNθ0 ≡ h− h⊥θ0 , (S.268)

where recall ∇ΥF (θ0)− : Fn → Bn denotes the right inverse of ∇ΥF (θ0) : Bn → Fn.

Further note that hNθ0 ∈ N (∇ΥF (θ0)) since ∇ΥF (θ0)∇ΥF (θ0)− = I implies that

∇ΥF (θ0)[hNθ0 ] = ∇ΥF (θ0)[h]−∇ΥF (θ0)∇ΥF (θ0)−∇ΥF (θ0)[h] = 0, (S.269)

by definition of h⊥θ0 in (S.268). Next, observe that if θ1 ∈ (Θr0n(P ))ε∩R and h/

√n ∈ Bn

satisfies ‖h/√n‖B ≤ `n and ΥF (θ1 + h/

√n) = 0, then θ1 + h/

√n ∈ (Θr

0n(P ))ε for n

sufficiently large, and hence by Assumption 3.12(i) and ΥF (θ1) = 0 due to θ1 ∈ R

‖∇ΥF (θ1)[h√n

]‖F = ‖ΥF (θ1 +h√n

)−ΥF (θ1)−∇ΥF (θ1)[h√n

]‖F ≤ Kf‖h√n‖2B. (S.270)

Hence, Assumption 3.12(ii), result (S.270), ‖θ0 − θ1‖B ≤ δn, and ‖h/√n‖B ≤ `n imply

‖∇ΥF (θ0)[h√n

]‖F

≤ ‖∇ΥF (θ0)−∇ΥF (θ1)‖o‖h√n‖B +Kf‖

h√n‖2B ≤ Kf `n(δn + `n). (S.271)

88

Moreover, since ∇ΥF (θ0) : Fn → Bn satisfies Assumption 3.12(iv), we also have that

Kf‖h⊥θ0‖B = Kf‖∇ΥF (θ0)−∇Υ(θ0)[h]‖B≤ Kf‖∇ΥF (θ0)−‖o‖∇ΥF (θ0)[h]‖F ≤Mf‖∇ΥF (θ0)[h]‖F. (S.272)

Further note that if Kf = 0, then (S.268) and (S.271) imply h⊥θ0 = 0. Thus, combining

results (S.271) and (S.272) to handle the case Kf > 0 we conclude that for any P ∈ P,

θ0 ∈ Θr0n(P ), θ1 ∈ (Θr

0n(P ))ε ∩ R satisfying ‖θ0 − θ1‖B ≤ δn and any h/√n ∈ Bn such

that ΥF (θ1 + h/√n) = 0 and ‖h/

√n‖B ≤ `n we have the bound

‖h⊥θ0√n‖B ≤Mf `n(δn + `n)1Kf > 0. (S.273)

Step 2: (Inequality Constraints). In what follows, it is convenient to define the set

Sn(θ0, θ1) ≡ h√n∈ Bn : ΥG(θ0 +

h√n

) ≤ 0, ΥF (θ1 +h√n

) = 0, ‖ h√n‖B ≤ `n.

Then note rn ≥ (Mgδn +Kgδ2n) ∨ 2(`n + δn)1Kg > 0 and Lemma S.7.1(i) imply that

An(θ1) ⊆ Sn(θ0, θ1) (S.274)

for n sufficiently large, all P ∈ P, θ0 ∈ Θr0n(P ), and θ1 ∈ (Θr

0n(P ))ε satisfying ‖θ0 −θ1‖B ≤ δn. The proof will proceed by verifying (S.266) holds with Sn(θ0, θ1) in place

of An(θ1). In particular, if Kf = 0, then ΥF (θ0) = ΥF (θ1) due to θ0, θ1 ∈ R, and

Assumptions 3.12(i)-(ii) together with (S.274) imply An(θ1) ⊆ Sn(θ0, θ1) ⊆ Tn(θ0).

Hence, result (S.266) holds for the case Kf = 0.

For the rest of the proof we therefore assume Kf > 0. We further note that Lemma

S.7.2 implies that for any ηn ↓ 0, there is a sufficiently large n and constant 1 ≤C < ∞ (independent of ηn) such that for all P ∈ P and θ0 ∈ Θr

0n(P ) there exists a

hθ0,n/√n ∈ Bn ∩ N (∇ΥF (θ0)) such that for any h/

√n ∈ Bn for which there exists a

h/√n ∈ Sn(θ0, θ1) satisfying ‖(h− h)/

√n‖B ≤ ηn the following inequalities hold

ΥG(θ0 +hθ0,n√n

+h√n

) ≤ 0 ‖hθ0,n√n‖B ≤ Cηn. (S.275)

Step 3: (Equality Constraints). The results in this step allow us to address the chal-

lenge that h/√n ∈ Sn(θ0, θ1) satisfies ΥF (θ1 + h/

√n) = 0 but not necessarily ΥF (θ0 +

h/√n) = 0. To this end, let R(∇ΥF (θ0)−∇ΥF (θ0)) denote the range of the operator

∇ΥF (θ0)−∇ΥF (θ0) : Bn → Bn and define the vector subspaces

BNθ0n ≡ Bn ∩N (∇ΥF (θ0)) B

⊥θ0n ≡ R(∇ΥF (θ0)−∇ΥF (θ0)). (S.276)

89

Since hNθ0 ∈ BNθ0n by (S.269), the definitions in (S.268) and (S.276) imply that Bn =

BNθ0n + B

⊥θ0n . Furthermore, since ∇ΥF (θ0)∇ΥF (θ0)− = I, we also have

∇ΥF (θ0)−∇ΥF (θ0)[h] = h (S.277)

for any h ∈ B⊥θ0n , and thus that B

Nθ0n ∩B

⊥θ0n = 0. Since Bn = B

Nθ0n + B

⊥θ0n , it then

follows that Bn = BNθ0n ⊕B

⊥θ0n – i.e. the decomposition in (S.268) is unique. Moreover,

we observe that BNθ0n ∩B

⊥θ0n = 0 further implies the restricted map∇ΥF (θ0) : B

⊥θ0n →

Fn is in fact bijective, and by (S.277) its inverse is ∇ΥF (θ0)− : Fn → B⊥θ0n .

Note Assumption 3.12(i) implies that for all n and P ∈ P, ΥF is Frechet differentiable

at all θ ∈ (Θr0n(P ))ε. Therefore, applying Lemma S.7.4 with A1 = B

Nθ0n , A2 = B

⊥θ0n

and K0 ≡ Kf ∨Mf ∨Mf/Kf yields that for any P ∈ P, θ0 ∈ Θr0n(P ) and hNθ0 ∈ B

Nθ0n

satisfying ‖hNθ0‖B ≤ ε/2 ∧ (2K0)−2 ∧ 12, there exists a h?(hNθ0 ) ∈ B⊥θ0n such that

ΥF (θ0 + hNθ0 + h?(hNθ0 )) = 0 ‖h?(hNθ0 )‖B ≤ 2K20‖hNθ0‖2B. (S.278)

Moreover, for any P ∈ P, θ0 ∈ Θr0n(P ), θ1 ∈ (Θr

0n(P ))ε ∩ R with ‖θ0 − θ1‖B ≤ δn, and

any h/√n ∈ Bn such that ΥF (θ1 + h/

√n) = 0 and ‖h/

√n‖B ≤ `n, result (S.273), the

decomposition in (S.268), and δn ↓ 0 (since Kf > 0), `n ↓ 0 imply that for n large

‖hNθ0√n‖B ≤ ‖

h√n‖B + ‖h

⊥θ0√n‖B ≤ 2`n. (S.279)

Thus, for hθ0,n ∈ BNθ0n as in (S.275), C ≥ 1, and results (S.278) and (S.279) imply that

for n sufficiently large we must have for all P ∈ P, θ0 ∈ Θr0n(P ), θ1 ∈ Bn ∩ R with

‖θ0 − θ1‖B ≤ δn and h/√n ∈ Bn satisfying ΥF (θ1 + h/

√n) = 0 that

ΥF (θ0 +hθ0,n√n

+hNθ0√n

+ h?(hθ0,n√n

+hNθ0√n

)) = 0

‖h?(hθ0,n√n

+hNθ0√n

)‖B − 16K20C

2(`2n + η2n) ≤ 0. (S.280)

Step 4: (Build Approximation). In order to employ Steps 2 and 3, we now set ηn to

ηn = 32(Mf + C2K20 )`n(`n + δn). (S.281)

In addition, for any P ∈ P, θ0 ∈ Θr0n(P ), θ1 ∈ Bn ∩ R satisfying ‖θ0 − θ1‖B ≤ δn, and

any h/√n ∈ Sn(θ0, θ1), we let hNθ0 be as in (S.268) and define

h√n≡hθ0,n√n

+hNθ0√n

+ h?(hθ0,n√n

+hNθ0√n

). (S.282)

From Steps 2 and 3 it then follows that for n sufficiently large (independent of P ∈ P,

90

θ0 ∈ Θr0n(P ), θ1 ∈ Bn ∩R with ‖θ0 − θ1‖B ≤ δn or h/

√n ∈ Sn(θ0, θ1)) we have

ΥF (θ0 +h√n

) = 0. (S.283)

Moreover, from results (S.280) and (S.281) we also obtain that for n sufficiently large

‖h?(hθ0,n√n

+hNθ0√n

)‖B ≤ 16C2K20 (`2n + η2

n) ≤ ηn2

+ 16C2K20η

2n ≤

3

4ηn. (S.284)

Thus, h = hNθ0+h⊥θ0 , (S.273), (S.281), (S.282) and (S.284) imply ‖(h−h−hθ0,n)/√n‖B ≤

ηn for n sufficiently large, and employing (S.275) with h = (h− hθ0,n)/√n) yields

ΥG(θ0 +h√n

) ≤ 0. (S.285)

Furthermore, since ‖hθ0,n/√n‖B ≤ Cηn by (S.275), results (S.273), (S.280), and ‖h/

√n‖B ≤

`n for any h/√n ∈ Sn(θ0, θ1) imply by (S.281) and `n ↓ 0, δn ↓ 0 that

‖ h√n‖B ≤ ‖

hθ0,n√n‖B + ‖h?(

hθ0,n√n

+hNθ0√n

)‖B + ‖h⊥θ0√n‖B + ‖ h√

n‖B

≤ Cηn + 16C2K20 (`2n + η2

n) +Mf `n(δn + `n) + `n ≤ 2`n (S.286)

for n sufficiently large. Therefore, we conclude from (S.283), (S.285), and (S.286) that

h/√n ∈ Tn(θ0). Similarly, (S.273), (S.275), (S.280), and (S.281) yield for some M <∞

‖ h√n− h√

n‖B ≤ ‖

hθ0,n√n‖B + ‖h?(

hθ0,n√n

+hNθ0√n

)‖B + ‖h⊥θ0√n‖B

≤ Cηn + 16C2K20 (`2n + η2

n) +Mf `n(`n + δn) ≤M`n(`n + δn),

which establishes the (S.266) for the case Kf > 0.

Lemma S.7.1. Let Assumptions 3.1, 3.11 hold, and `n ↓ 0 be given. (i) Then, there

are n0 <∞ and ε > 0 such that for all n > n0, P ∈ P0, θ0 ∈ Θr0n(P ), θ1 ∈ (Θr

0n(P ))ε:

h√n∈ Bn : ΥG(θ1 +

h√n

) ≤ (ΥG(θ1)−Kgr‖h√n‖B1G)∨ (−r1G), and ‖ h√

n‖B ≤ `n

⊆ h√n∈ Bn : ΥG(θ0 +

h√n

) ≤ 0 and ‖ h√n‖B ≤ `n

for any r ≥ Mg‖θ0 − θ1‖B +Kg‖θ0 − θ1‖2B ∨ 2`n + ‖θ0 − θ1‖B1Kg > 0. (ii) If in

addition ΥG is affine, then for any n, θ0, θ1 ∈ Bn and r ≥Mg‖θ0 − θ1‖B we have

h√n∈ Bn : ΥG(θ1 +

h√n

) ≤ ΥG(θ1) ∨ (−r1G) ⊆ h√n∈ Bn : ΥG(θ0 +

h√n

) ≤ 0.

91

Proof: Let ε > 0 be such that Assumption 3.11 holds, set ε = ε/2, and for notational

simplicity let Nn,P (δ) ≡ θ ∈ Bn :−→d H(θ,Θr

0n(P ), ‖ · ‖B) < δ for any δ > 0. Then

note that for any θ1 ∈ Nn,P (ε) and ‖h/√n‖B ≤ `n we have θ1 + h/

√n ∈ Nn,P (ε) for n

sufficiently large. Therefore, by Assumption 3.11(ii) we obtain that

‖ΥG(θ1 +h√n

)−ΥG(θ1)−∇ΥG(θ1)[h√n

]‖G ≤ Kg‖h√n‖2B. (S.287)

Similarly, Assumption 3.11(ii) implies that if θ0 ∈ Θr0n(P ) and θ1 ∈ Nn,P (ε), then

‖∇ΥG(θ0)[h√n

]−∇ΥG(θ1)[h√n

]‖G

≤ ‖∇ΥG(θ0)−∇ΥG(θ1)‖o‖h√n‖B ≤ Kg‖θ0 − θ1‖B‖

h√n‖B (S.288)

for any h/√n ∈ Bn. Hence, since ΥG(θ0) ≤ 0 due to θ0 ∈ Θn ∩R we can conclude that

ΥG(θ0 +h√n

) + ΥG(θ1)−ΥG(θ1 +h√n

)

≤ ΥG(θ0 +h√n

)−ΥG(θ0)+ ΥG(θ1)−ΥG(θ1 +h√n

)

≤ Kg‖h√n‖B2‖

h√n‖B + ‖θ0 − θ1‖B1G, (S.289)

by (S.287), (S.288), and Lemma S.7.3. Also note for any θ0 ∈ Θr0n(P ), θ1 ∈ Nn,P (ε), and

h/√n ∈ Bn with ‖h/

√n‖B ≤ `n we have θ0 +h/

√n ∈ Nn,P (ε) and θ1 +h/

√n ∈ Nn,P (ε)

for n sufficiently large. Thus, by Assumptions 3.11(i), 3.11(iii), and Lemma S.7.3

ΥG(θ0 +h√n

)−ΥG(θ1 +h√n

) ≤ ∇ΥG(θ0 +h√n

)[θ0 − θ1] +Kg‖θ0 − θ1‖2B1G

≤ Mg‖θ0 − θ1‖B +Kg‖θ0 − θ1‖2B1G. (S.290)

Hence, (S.289) and (S.290) yield for r ≥ Mg‖θ0− θ1‖B +Kg‖θ0− θ1‖2B∨ 2`n + ‖θ0−θ1‖B1Kg > 0, θ0 ∈ Θr

0n(P ), θ1 ∈ Nn,P (ε), ‖h/√n‖B ≤ `n, and n large

ΥG(θ0 +h√n

) ≤ ΥG(θ1 +h√n

) + (Kgr‖h√n‖B −ΥG(θ1))1G ∧ r1G

= ΥG(θ1 +h√n

)− (ΥG(θ1)−Kgr‖h√n‖B)1G ∨ (−r1G) (S.291)

where the equality follows from (−a) ∨ (−b) = −(a ∧ b) by Theorem 8.6 in Aliprantis

and Border (2006). Since a1 ≤ a2 and b1 ≤ b2 implies a1∧b1 ≤ a2∧b2 in G by Corollary

8.7 in Aliprantis and Border (2006), (S.291) implies that for n sufficiently large and any

92

θ0 ∈ Θr0n(P ), θ1 ∈ Nn,P (ε) and h/

√n ∈ Bn satisfying ‖h/

√n‖B ≤ `n and

ΥG(θ1 +h√n

) ≤ (ΥG(θ1)−Kgr‖h√n‖B1G) ∨ (−r1G)

we must have ΥG(θ0 + h/√n) ≤ 0, which verifies the first claim of the lemma. For the

second claim, just note that if ΥG is affine, then we may set Kg = 0 and ε = +∞ in

Assumption 3.11, which leads to the desired simplification.

Lemma S.7.2. If Assumptions 3.1, 3.11, 3.13(ii) hold, and ηn ↓ 0, `n ↓ 0, then there

is a n0 (depending on ηn, `n) and a C < ∞ (independent of ηn, `n) such that for all

n > n0, P ∈ P0, and θ0 ∈ Θr0n(P ) there is hθ0,n/

√n ∈ Bn ∩N (∇ΥF (θ0)) with

ΥG(θ0 +hθ0,n√n

+h√n

) ≤ 0 ‖hθ0,n√n‖B ≤ Cηn (S.292)

for all h/√n ∈ Bn for which there is a h/

√n ∈ Bn satisfying ‖(h − h)/

√n‖B ≤ ηn,

‖h/√n‖B ≤ `n and the inequality ΥG(θ0 + h/

√n) ≤ 0.

Proof: By Assumption 3.13(ii) there are ε > 0 and Kd < ∞ such that for every

P ∈ P0, n, and θ0 ∈ Θr0n(P ) there exists a hθ0,n ∈ Bn ∩N (∇ΥF (θ0)) satisfying

ΥG(θ0) +∇ΥG(θ0)[hθ0,n] ≤ −ε1G (S.293)

and ‖hθ0,n‖B ≤ Kd. Moreover, for any h/√n ∈ Bn such that ‖h/

√n‖B ≤ `n, Assump-

tion 3.11(i), Lemma S.7.3, and Kg`2n ≤Mg`n for n sufficiently large yield

ΥG(θ0 +h√n

) ≤ ΥG(θ0) +∇ΥG(θ0)[h√n

] +Kg‖h√n‖2B1G

≤ ΥG(θ0) + ‖∇ΥG(θ0)‖o`n +Kg`2n1G ≤ ΥG(θ0) + 2Mg`n1G. (S.294)

Hence, (S.293) and (S.294) imply for any h/√n ∈ Bn with ‖h/

√n‖B ≤ `n we must have

ΥG(θ0 +h√n

) +∇ΥG(θ0)[hθ0,n] ≤ 2Mg`n − ε1G. (S.295)

Next, we let C0 > 8Mg/ε and aim to show (S.292) holds with C = C0Kd by setting

hθ0,n√n≡ C0ηnhθ0,n. (S.296)

To this end, we first note that if θ0 ∈ Θr0n(P ), h/

√n ∈ Bn satisfies ‖h/

√n‖B ≤ `n and

ΥG(θ0 + h/√n) ≤ 0, and h/

√n ∈ Bn is such that ‖(h− h)/

√n‖B ≤ ηn, then definition

(S.296) implies that ‖(hθ0,n + h)/√n‖B = o(1). Therefore, Assumption 3.11(i), Lemma

93

S.7.3, and ‖(h− h)/√n‖B ≤ ηn together allow us to conclude

ΥG(θ0 +hθ0,n√n

+h√n

)

≤ ΥG(θ0 +h√n

) +∇ΥG(θ0 +h√n

)[hθ0,n√n

+(h− h)√

n] + 2Kg(‖

hθ0,n√n‖2B + η2

n)1G

≤ ΥG(θ0 +h√n

) +∇ΥG(θ0 +h√n

)[hθ0,n√n

] + 2Kg‖hθ0,n√n‖2B + 2Mgηn1G, (S.297)

where the final result follows from Assumption 3.11(iii) and 2Kgη2n ≤Mgηn for n suffi-

ciently large. Similarly, Assumption 3.11(ii) and Lemma S.7.3 yield

∇ΥG(θ0 +h√n

)[hθ0,n√n

] ≤ ∇ΥG(θ0)[hθ0,n√n

] + ‖∇ΥG(θ0 +h√n

)−∇ΥG(θ0)‖o‖hθ0,n√n‖B1G

≤ ∇ΥG(θ0)[hθ0,n√n

] +Kg`n‖hθ0,n√n‖B1G. (S.298)

Hence, combining results (S.297) and (S.298), ‖hθ0,n/√n‖B ≤ C0Kdηn due to ‖hθ0,n‖B ≤

Kd, and ηn ↓ 0, `n ↓ 0, we obtain that for n sufficiently large we have

ΥG(θ0 +hθ0,n√n

+h√n

) ≤ ΥG(θ0 +h√n

) +∇ΥG(θ0)[hθ0,n√n

] + 4Mgηn1G. (S.299)

In addition, since C0ηn ↓ 0, we have C0ηn ≤ 1 eventually, and hence ΥG(θ0 +h/√n) ≤ 0,

2Mg`n ≤ ε/2 for n sufficiently large due to `n ↓ 0, and result (S.295) imply that

ΥG(θ0 +h√n

) + C0ηn∇ΥG(θ0)[hθ0,n] ≤ C0ηnΥG(θ0 +h√n

) +∇ΥG(θ0)[hθ0,n]

≤ C0ηn2Mg`n − ε1G ≤ −C0ηnε

21G. (S.300)

Thus, we can conclude from results (S.296), (S.299), (S.300), and C0 > 8Mg/ε that

ΥG(θ0 +hθ0,n√n

+h√n

) ≤ 4Mg −C0ε

2ηn1G ≤ 0,

for n sufficiently large, which establishes the claim of the Lemma.

Lemma S.7.3. If A is an AM space with norm ‖ · ‖A and unit 1A, and a1, a2 ∈ A,

then it follows that a1 ≤ a2 + C1A for any a1, a2 ∈ A satisfying ‖a1 − a2‖A ≤ C.

Proof: Since A is an AM space with unit 1A we have that ‖a1 − a2‖A ≤ C implies

|a1 − a2| ≤ C1A, and hence the claim follows trivially from a1 − a2 ≤ |a1 − a2|.

Lemma S.7.4. Let A and C be Banach spaces with norms ‖·‖A and ‖·‖C, A = A1⊕A2

and F : A→ C. Suppose F (a0) = 0 and that there are ε0 > 0 and K0 <∞ such that:

(i) F : A→ C is Frechet differentiable at all a ∈ Bε0(a0) ≡ a ∈ A : ‖a−a0‖A ≤ ε0.

94

(ii) ‖F (a+ h)− F (a)−∇F (a)[h]‖C ≤ K0‖h‖2A for all a, a+ h ∈ Bε0(a0).

(iii) ‖∇F (a1)−∇F (a2)‖o ≤ K0‖a1 − a2‖A for all a1, a2 ∈ Bε0(a0).

(iv) ∇F (a0) : A→ C has ‖∇F (a0)‖o ≤ K0.

(v) ∇F (a0) : A2 → C is bijective and ‖∇F (a0)−1‖o ≤ K0.

Then, for all h1 ∈ A1 with ‖h1‖A ≤ ε02 ∧ (4K20 )−1 ∧ 12 there is a unique h?2(h1) ∈ A2

with F (a0 + h1 + h?2(h1)) = 0. In addition, h?2(h1) satisfies ‖h?(h1)‖A ≤ 4K20‖h1‖A for

arbitrary A1, and ‖h?(h1)‖A ≤ 2K20‖h1‖2A when A1 = N (∇F (a0)).

Proof: We closely follow the arguments in the proof of Theorems 4.B in Zeidler (1985).

First, we define g : A1 ×A2 → C pointwise for any h1 ∈ A1 and h2 ∈ A2 by

g(h1, h2) ≡ ∇F (a0)[h2]− F (a0 + h1 + h2). (S.301)

Since ∇F (a0) : A2 → C is bijective by hypothesis, F (a0 + h1 + h2) = 0 if and only if

h2 = ∇F (a0)−1[g(h1, h2)]. (S.302)

Letting Th1 : A2 → A2 be given by Th1(h2) = ∇F (a0)−1[g(h1, h2)], we see from (S.302)

that the desired h?2(h1) must be a fixed point of Th1 . Next, define the set

M0 ≡ h2 ∈ A2 : ‖h2‖A ≤ δ0

for δ0 ≡ ε02 ∧ (4K2

0 )−1 ∧ 1, and consider an arbitrary h1 ∈ A1 with ‖h1‖A ≤ δ20 . Notice

that then a0+h1+h2 ∈ Bε0(a0) for any h2 ∈M0 and hence g is differentiable with respect

to h2 with derivative ∇2g(h1, h2) ≡ ∇F (a0)−∇F (a0 + h1 + h2). Thus, if h2, h2 ∈M0,

then Proposition 7.3.2 in Luenberger (1969) implies that

‖g(h1, h2)− g(h1, h2)‖C ≤ sup0<τ<1

‖∇2g(h1, h2 + τ(h2 − h2))‖o‖h2 − h2‖A

≤ 1

2K0‖h2 − h2‖A, (S.303)

where the final inequality follows by Condition (iii) and δ20 ≤ δ0 ≤ (4K2

0 )−1. Moreover,

‖∇F (a0)[h2]−∇F (a0 + h1)[h2]‖C

≤ ‖∇F (a0)−∇F (a0 + h1)‖o‖h2‖A ≤ K0‖h1‖A‖h2‖A ≤‖h2‖A4K0

(S.304)

by Condition (iii) and ‖h1‖A ≤ δ0 ≤ (4K20 )−1. Similarly, for any h2 ∈M0 we have

‖F (a0 + h1 + h2)− F (a0 + h1)−∇F (a0 + h1)[h2]‖C ≤ K0‖h2‖2A ≤‖h2‖A4K0

due to a0 + h1 ∈ Bε0(a0) and Condition (ii). Moreover, since F (a0) = 0 by hypothesis,

95

Conditions (ii) and (iv), ‖h1‖A ≤ δ20 , and δ0 ≤ (4K2

0 )−1 yield that

‖F (a0+h1)‖C = ‖F (a0+h1)−F (a0)‖C ≤ K0‖h1‖2A+‖∇F (a0)‖o‖h1‖A ≤δ0

2K0. (S.305)

Hence, by (S.301) and (S.304)-(S.305) we obtain for any h2 ∈M0 and h1 with ‖h1‖A ≤ δ20

‖g(h1, h2)‖C ≤‖h2‖A2K0

+δ0

2K0≤ δ0

K0. (S.306)

Thus, since ‖∇F (a0)−1‖o ≤ K0 by Condition (v), result (S.306) implies Th1 : M0 →M0, and (S.303) yields ‖Th1(h2) − Th1(h2)‖A ≤ 2−1‖h2 − h2‖A for any h2, h2 ∈ M0.

By Theorem 1.1.1.A in Zeidler (1985) we then conclude Th1 has a unique fixed point

h?2(h1) ∈M0, and the first claim of the Lemma follows from (S.301) and (S.302).

Next, we note that since h?2(h1) is a fixed point of Th1 , we can conclude that

‖h?2(h1)‖A = ‖Th1(h?2(h1))‖A ≤ ‖Th1(h?2(h1))− Th1(0)‖A + ‖Th1(0)‖A. (S.307)

Thus, since (S.303) and ‖∇F (a0)−1‖o ≤ K0 imply ‖Th1(h?2(h1))−Th1(0)‖A ≤ 2−1‖h?2(h1)‖A,

it follows from result (S.307) and Th1(0) ≡ −∇F (a0)−1F (a0 + h1) that

1

2‖h?2(h1)‖A ≤ ‖Th1(0)‖A ≤ K0‖F (a0 + h1)‖C

≤ K0K0‖h1‖2A + ‖∇F (a0)‖o‖h1‖A ≤ 2K20‖h1‖A, (S.308)

where in the second inequality we employed ‖∇F (a0)−1‖o ≤ K0, in the third inequality

we used (S.305), and in the final inequality we exploited ‖h1‖A ≤ 1. While the estimate

in (S.308) applies for generic A1, we note that if in addition A1 = N (∇F (a0)), then

1

2‖h?2(h1)‖A ≤ ‖Th1(0)‖A ≤ K0‖F (a0 + h1)‖C ≤ K2

0‖h1‖2A ,

due to F (a0) = 0 and ∇F (a0)[h1] = 0, and thus the final claim of the lemma follows.

S.8 Coupling via Koltchinskii (1994)

In this section we develop uniform coupling results for empirical processes that help

verify Assumption 3.7 in specific applications. Our analysis is based on the Hungarian

construction of Massart (1989) and Koltchinskii (1994), and we state the results in a

notation that abstracts from the rest of the paper due to their potential independent

interest. Thus, in what follows we consider V ∈ Rdv to be a generic random variable

distributed according to P ∈ P, denote its support under P by Ω(P ) ⊂ Rdv , and let λ

denote the Lebesgue measure on Rdv .

96

The rates obtained through a Hungarian construction crucially depend on the ability

of the functions indexing the empirical process to be approximated by a suitable Haar

basis. Here, we follow Koltchinskii (1994) and control the relevant approximation errors

through primitive conditions stated in terms of the integral modulus of continuity. For

a measure P and a function f : Rdv → R, the integral modulus of continuity of f is the

function $(f, ·, P ) : R+ → R+ defined for every h ∈ R+ as

$(f, h, P ) ≡ sup‖s‖≤h

(

∫Ω(P )

(f(v + s)− f(v))21v + s ∈ Ω(P )dP (v))12 . (S.309)

Intuitively, the integral modulus of continuity quantifies the “smoothness” of a function

f by examining the difference between f and its own translation. For example, it is

straightforward to verify that $(f, h, P ) . h whenever f is Lipschitz. In contrast

indicator functions such as f(v) = 1v ≤ t typically satisfy $(f, h, P ) . h1/2.

We impose the following assumptions to establish the uniform coupling results.

Assumption S.8.1. (i) For all P ∈ P, P λ and Ω(P ) is compact; (ii) The densities

dP/dλ satisfy supP∈P supv∈Ω(P )dPdλ (v) <∞ and infP∈P infv∈Ω(P )

dPdλ (v) > 0.

Assumption S.8.2. (i) For each P ∈ P there is a continuously differentiable bijec-

tion TP : [0, 1]dv → Ω(P ); (ii) The Jacobian JTP and its determinant |JTP | satisfy

infP∈P infv∈[0,1]dv |JTP (v)| > 0 and supP∈P supv∈[0,1]dv ‖JTP (v)‖o <∞.

Assumption S.8.3. The classes of functions Fn satisfy: (i) supP∈P supf∈Fn $(f, h, P ) ≤ϕn(h) for some ϕn : R+ → R+ satisfying ϕn(Ch) ≤ Cκϕn(h) for all n, C > 0, and

some κ > 0; and (ii) supP∈P supf∈Fn ‖f‖P,∞ ≤ Kn for some Kn > 0.

In Assumption S.8.1 we impose that V ∼ P be continuously distributed for all

P ∈ P, with uniformly (in P ) bounded supports and densities bounded from above

and away from zero. Assumption S.8.2 requires that the support of V under each P

be “smooth” in the sense that it may be seen as a differentiable transformation of the

unit square. Together, Assumptions S.8.1 and S.8.2 enable us to construct partitions

of Ω(P ) such that the diameter of each set in the partition is controlled uniformly in

P ; see Lemma S.8.1. As a result, the approximation error by the Haar bases implied

by each partition can be controlled uniformly by the integral modulus of continuity; see

Lemma S.8.2. Together with Assumption S.8.3, which imposes conditions on the integral

modulus of continuity of Fn uniformly in P , we can obtain a uniform coupling result

through the analysis in Koltchinskii (1994). We note that the homogeneity condition

on ϕn in Assumption S.8.3(i) is not necessary, but imposed to simplify the bound.

The next theorem provides us with an important tool for verifying Assumption

3.7. In its statement we employ the same notation as in the main text, where recall

N[ ](ε,Fn, ‖·‖P,2) denotes the smallest number of brackets of size ε (under ‖·‖P,2) needed

97

to cover Fn, while J[ ](ε,Fn, ‖ · ‖P,2) is defined as the integral

J[ ](ε,Fn, ‖ · ‖P,2) ≡∫ ε

0

√1 + logN[ ](u,Fn, ‖ · ‖P,2)du.

Theorem S.8.1. Let Assumptions S.8.1-S.8.3 hold, Vini=1 be i.i.d. with Vi ∼ P ∈ P

and for any δn ↓ 0 let Nn ≡ supP∈PN[ ](δn,Fn, ‖ ·‖P,2), Jn ≡ supP∈P J[ ](δn,Fn, ‖ ·‖P,2),

Sn ≡ (

dlog2 ne∑i=0

2iϕ2n(2−

idv ))

12 . (S.310)

If Nn ↑ ∞, then there are Gaussian processes Wn,P ∞n=1 such that uniformly in P ∈ P

‖Gn,P −Wn,P ‖Fn

= OP (Kn log(nNn)√

n+Kn

√log(nNn) log(n)Sn√

n+ Jn(1 +

JnKn

δ2n

√n

)). (S.311)

Theorem S.8.1 is a mild modification of the results in Koltchinskii (1994). The proof

of Theorem S.8.1 relies on a coupling of the empirical process on a sequence of grids of

cardinality Nn, and employs the equicontinuity of Gn,P and Wn,P to obtain a coupling

on the entire class Fn. The conclusion of Theorem S.8.1 applies to any choice of grid

accuracy δn. In order to obtain the best rate, δn must be chosen to balance the terms

in (S.311) and thus depends on the metric entropy of Fn through the terms Nn and Jn.

Below, we include the proof of Theorem S.8.1 and auxiliary results.

Proof of Theorem S.8.1: Let ∆i(P ) be the partitions of Ω(P ) in Lemma S.8.1

and BP,i the σ-algebra generated by ∆i(P ). By Lemma S.8.2 and Assumption S.8.3,

supP∈P

supf∈Fn

(

dlog2 ne∑i=0

2iEP [(f(V )− EP [f(V )|BP,i])2])12

≤ C1(

dlog2 ne∑i=0

2iϕ2n(2−

idv ))

12 ≡ C1Sn (S.312)

for some constant C1 > 0 and for Sn as defined in (S.310). Next, let FP,n,δn ⊆ Fn denote

a finite δn-net of Fn with respect to ‖ · ‖P,2. Since N(ε,Fn, ‖ · ‖P,2) ≤ N[ ](ε,Fn, ‖ · ‖P,2),

it follows from the definition of Nn that we may choose FP,n,δn so that

supP∈P

card(FP,n,δn) ≤ supP∈P

N[ ](δn,Fn, ‖ · ‖P,2) ≡ Nn. (S.313)

By Theorem 3.5 in Koltchinskii (1994), (S.312) and (S.313), it follows that for each

98

n ≥ 1 there exists an isonormal process Wn,P , such that for all η1 > 0, η2 > 0

supP∈P

P (

√n

Kn‖Gn,P −Wn,P ‖FP,n,δn ≥ η1 +

√η1√η2(C1Sn + 1))

. Nn exp−C2η1+ n exp−C2η2, (S.314)

for some C2 > 0. Since Nn ↑ ∞, (S.314) implies for any ε > 0 there are C3 > 0, C4 > 0

sufficiently large, such that setting η1 ≡ C3 log(Nn) and η2 ≡ C3 log(n) yields

supP∈P

P (‖Gn,P −Wn,P ‖FP,n,δn ≥ C4Kn ×log(nNn) +

√log(Nn) log(n)Sn√n

) < ε. (S.315)

Next, note that by definition of FP,n,δn , there exists a Γn,P : Fn → FP,n,δn such that

supP∈P supf∈Fn ‖f −Γn,P f‖P,2 ≤ δn. Let D(ε,Fn, ‖ · ‖P,2) denote the ε-packing number

for Fn under ‖ · ‖P,2, and note D(ε,Fn, ‖ · ‖P,2) ≤ N[ ](ε,Fn, ‖ · ‖P,2). Therefore, by

Corollary 2.2.8 in van der Vaart and Wellner (1996) we can conclude that

supP∈P

EP [‖Wn,P −Wn,P Γn,P ‖Fn ]

. supP∈P

∫ δn

0

√logD(ε,Fn, ‖ · ‖P,2)dε ≤ sup

P∈PJ[ ](δn,Fn, ‖ · ‖P,2) ≡ Jn. (S.316)

Similarly, employing Lemma 3.4.2 in van der Vaart and Wellner (1996) yields that

supP∈P

EP [‖Gn,P −Gn,P Γn,P ‖Fn ]

. supP∈P

J[ ](δn,Fn, ‖ · ‖P,2)(1 + supP∈P

J[ ](δn,Fn, ‖ · ‖P,2)Kn

δ2n

√n

) ≡ Jn(1 +JnKn

δ2n

√n

) (S.317)

Therefore, combining (S.315), (S.316), and (S.317) together with the decomposition

‖Gn,P −Wn,P ‖Fn≤ ‖Gn,P −Wn,P ‖FP,n,δn + ‖Gn,P −Gn,P Γn,P ‖Fn + ‖Wn,P −Wn,P Γn,P ‖Fn ,

establishes the claim of the theorem by Markov’s inequality.

Lemma S.8.1. Let BP denote the completion of the Borel σ−algebra on Ω(P ) with

respect to P . If Assumptions S.8.1(i)(ii) and S.8.2(i)(ii) hold, then for each P ∈ P

there exists a sequence ∆i(P ) of partitions of (Ω(P ),BP , P ) such that:

(i) ∆i(P ) = ∆i,k(P ) : k = 0, . . . , 2i − 1, ∆i,k(P ) ∈ BP , and ∆0,0(P ) = Ω(P ).

(ii) ∆i,k(P ) = ∆i+1,2k(P ) ∪∆i+1,2k+1(P ) and ∆i+1,2k(P ) ∩∆i+1,2k+1(P ) = ∅ for any

integers k = 0, . . . 2i − 1 and i ≥ 0.

(iii) P (∆i+1,2k(P )) = P (∆i+1,2k+1(P )) = 2−i−1 for k = 0, . . . 2i − 1, i ≥ 0.

(iv) supP∈P max0≤k≤2i−1 supv,v′∈∆i,k(P ) ‖v − v′‖2 = O(2−idv ).

99

(v) BP equals the completion with respect to P of the σ-algebra generated by⋃i≥0 ∆i(P ).

Proof: Let A denote the Borel σ-algebra on [0, 1]dv , and for any A ∈ A define

QP (A) ≡ P (TP (A)), (S.318)

where TP (A) ∈ BP due to T−1P being measurable. Moreover, QP ([0, 1]dv) = 1 due to TP

being surjective, and QP is σ-additive due to TP being injective. Hence, we conclude

QP defined by (S.318) is a probability measure on ([0, 1]dv ,A). In addition, for λ the

Lebesgue measure, we obtain from Theorem 3.7.1 in Bogachev (2007) that

QP (A) = P (TP (A)) =

∫TP (A)

dP

dλ(v)dλ(v) =

∫A

dP

dλ(TP (a))|JTP (a)|dλ(a), (S.319)

where |JTP (a)| denotes the Jacobian of TP at any point a ∈ [0, 1]dv . Hence, QP has

density with respect to Lebesgue measure given by gP (a) ≡ dPdλ (TP (a))|JTP (a)| for any

a ∈ [0, 1]dv . Next, let a = (a1, . . . , adv)′ ∈ [0, 1]dv and define for any t ∈ [0, 1]

Gl,P (t|A) ≡ QP (a ∈ A : al ≤ t)QP (A)

, (S.320)

for any set A ∈ A and 1 ≤ l ≤ dv. Further let m(i) ≡ i− b i−1dvc × dv (i.e. m(i) equals i

modulo dv), set ∆0,0(P ) = [0, 1]dv , and inductively define the partitions of [0, 1]dv

∆i+1,2k(P ) ≡ a ∈ ∆i,k(P ) : Gm(i+1),P (am(i+1)|∆i,k(P )) ≤ 1

2

∆i+1,2k+1(P ) ≡ ∆i,k(P ) \ ∆i+1,2k(P ) (S.321)

for 0 ≤ k ≤ 2i − 1. For cl(∆i,k(P )) the closure of ∆i,k(P ), we then note that by

construction each ∆i,k(P ) is a hyper-rectangle in [0, 1]dv – i.e. it is of the general form

cl(∆i,k(P )) =

dv∏j=1

[li,k,j(P ), ui,k,j(P )].

Moreover, since gP is positive everywhere on [0, 1]dv by Assumptions S.8.1(ii) and

S.8.2(ii), it follows that for any i ≥ 0, 0 ≤ k ≤ 2i − 1 and 1 ≤ j ≤ dv, we have

li+1,2k,j(P ) = li,k,j(P )

ui+1,2k,j(P ) =

ui,k,j(P ) if j 6= m(i+ 1)

solves Gm(i+1),P (ui+1,2k,j(P )|∆i,k(P )) = 12 if j = m(i+ 1)

.

(S.322)

100

Similarly, since ∆i+1,2k+1(P ) = ∆i,k(P ) \ ∆i+1,2k(P ), it additionally follows that

ui+1,2k+1,j(P ) = ui,k,j(P ) li+1,2k+1,j(P ) =

li,k,j(P ) if j 6= m(i+ 1)

ui+1,2k,j(P ) if j = m(i+ 1).

(S.323)

Since QP (cl(∆i+1,2k(P ))) = QP (∆i+1,2k(P )) by QP λ, (S.320) and (S.322) yield

QP (∆i+1,2k(P )) = QP (a ∈ ∆i,k(P ) : am(i+1) ≤ ui+1,2k,m(i+1)(P ))

= Gm(i+1),P (ui+1,2k,m(i+1)(P )|∆i,k(P ))QP (∆i,k(P ))

=1

2QP (∆i,k(P )).

Therefore, since ∆i,k(P ) = ∆i+1,2k(P )∪∆i+1,2k+1(P ), it follows thatQP (∆i+1,2k+1(P )) =12QP (∆i,k(P )) for 0 ≤ k ≤ 2i − 1 as well. In particular, QP (∆0,0(P )) = 1 implies that

QP (∆i,k(P )) =1

2i(S.324)

for any integers i ≥ 1 and 0 ≤ k ≤ 2i − 1. Moreover, we note that result (S.319) and

Assumptions S.8.1(ii) and S.8.2(ii) together imply that the density gP of QP satisfies

0 < infP∈P

infa∈[0,1]dv

gP (a) < supP∈P

supa∈[0,1]dv

gP (a) <∞, (S.325)

and therefore QP (A) λ(A) uniformly in A ∈ A and P ∈ P. Hence, since by (S.322)

ui+1,2k,j(P ) = ui,k,j(P ) and li+1,2k,j(P ) = li,k,j(P ) for all j 6= m(i+ 1), we obtain

(ui+1,2k,m(i+1)(P )− li+1,2k,m(i+1)(P ))

(ui,k,m(i+1)(P )− li,k,m(i+1)(P ))=

∏dvj=1(ui+1,2k,j(P )− li+1,2k,j(P ))∏dv

j=1(ui,k,j(P )− li,k,j(P ))

=λ(∆i+1,2k(P ))

λ(∆i,k(P ))QP (∆i+1,2k(P ))

QP (∆i,k(P ))=

1

2(S.326)

uniformly in P ∈ P, i ≥ 0, and 0 ≤ k ≤ 2i−1 by results (S.324) and (S.325). Moreover,

by identical arguments but using (S.323) instead of (S.322) we conclude

(ui+1,2k+1,m(i+1)(P )− li+1,2k+1,m(i+1)(P ))

(ui,k,m(i+1)(P )− li,k,m(i+1)(P )) 1

2(S.327)

also uniformly in P ∈ P, i ≥ 0 and 0 ≤ k ≤ 2i − 1. Thus, since (ui+1,2k,j(P ) −li+1,2k,j(P )) = (ui+1,2k+1,j(P )−li+1,2k+1,j(P )) = (ui,k,j(P )−li,k,j(P )) for all j 6= m(i+1),

and u0,0,j(P ) − l0,0,j(P ) = 1 for all 1 ≤ j ≤ dv we obtain from m(i) = i − b i−1dvc × dv,

results (S.326) and (S.327), and proceeding inductively that

(ui,k,j(P )− li,k,j(P )) 2−idv , (S.328)

101

uniformly in P ∈ P, i ≥ 0, 0 ≤ k ≤ 2i − 1, and 1 ≤ j ≤ dv. Thus, result (S.328) yields

supP∈P

max0≤k≤2i−1

supa,a′∈∆i,k(P )

‖a− a′‖

≤ supP∈P

max0≤k≤2i−1

max1≤j≤dv

√dv × (ui,j,k(P )− li,j,k(P )) = O(2−

idv ). (S.329)

We next obtain the desired sequence of partitions ∆i(P ) of (Ω(P ),BP , P ) by con-

structing them from the partition ∆i,k(P ) of [0, 1]dv . To this end, set

∆i,k(P ) ≡ TP (∆i,k(P ))

for all i ≥ 0 and 0 ≤ k ≤ 2i − 1. Note that ∆i(P ) satisfies conditions (i) and (ii) due

to T−1P being a measurable map, TP being bijective, and result (S.321). In addition,

∆i(P ) satisfies condition (iii) since by definition (S.318) and result (S.324) we have

P (∆i,k(P )) = P (TP (∆i,k(P ))) = QP (∆i,k(P )) = 2−i,

for all 0 ≤ k ≤ 2i−1. Moreover, by Assumption S.8.2(ii), supP∈P supa∈[0,1]dv ‖JTP (a)‖o <∞, and hence by the mean value theorem we can conclude that

supP∈P

max0≤k≤2i−1

supv,v′∈∆i,k(P )

‖v − v′‖ = supP∈P

max0≤k≤2i−1


‖TP (a)− TP (a′)‖

. supP∈P

max0≤k≤2i−1


‖a− a′‖ = O(2−idv ),

by result (S.329), which verifies that ∆i(P ) satisfies condition (iv). Also note that to

verify ∆i(P ) satisfies condition (v) it suffices to show that⋃i≥0 ∆i(P ) generates the

Borel σ-algebra on Ω(P ). To this end, we first aim to show that

A = σ(⋃i≥0

∆i(P )), (S.330)

where for a collection of sets C, σ(C) denotes the σ-algebra generated by C. For any

closed set A ∈ A, then define Di(P ) to be given by

Di(P ) ≡⋃

k:∆i,k(P )∩A 6=∅

∆i,k(P ).

Notice that since ∆i(P ) is a partition of [0, 1]dv , A ⊆ Di(P ) for all i ≥ 0 and hence A ⊆⋂i≥0Di(P ). Moreover, if a0 ∈ Ac, then Ac being open and (S.329) imply a0 /∈ Di(P )

for i sufficiently large. Hence, Ac ∩ (⋂i≥0Di(P )) = ∅ and therefore A =

⋂i≥0Di(P ). It

follows that if A is closed, then A ∈ σ(⋃i≥0 ∆i(P )), which implies A ⊆ σ(

⋃i≥0 ∆i(P )).

On the other hand, since ∆i,k(P ) is Borel for all i ≥ 0 and 0 ≤ k ≤ 2i − 1, we also have

102

σ(⋃i≥0 ∆i(P )) ⊆ A, and hence (S.330) follows. To conclude, we then note that

σ(⋃i≥0

∆i(P )) = σ(⋃i≥0

TP (∆i(P ))) = TP (σ(⋃i≥0

∆i(P ))) = TP (A), (S.331)

by Corollary 1.2.9 in Bogachev (2007). However, TP and T−1P being continuous im-

plies TP (A) equals the Borel σ-algebra in Ω(P ), and therefore (S.331) implies ∆i(P )satisfies condition (v) establishing the lemma.

Lemma S.8.2. Let ∆i(P ) be as in Lemma S.8.1, and BP,i denote the σ-algebra gener-

ated by ∆i(P ). If Assumptions S.8.1(i)(ii) and S.8.2(i)(ii) hold, then there are K0 > 0,

K1 > 0 such that for all P ∈ P and any f satisfying f ∈ L2P for all P ∈ P:

EP [(f(V )− EP [f(V )|BP,i])2] ≤ K0 ×$2(f,K1 × 2−idv , P ).

Proof: Since ∆i(P ) is a partition of Ω(P ) and P (∆i,k(P )) = 2−i for all i ≥ 0 and

0 ≤ k ≤ 2i − 1, we may express EP [f(V )|BP,i] as an element of L2P by

EP [f(V )|BP,i] = 2i2i−1∑k=0

1v ∈ ∆i,k(P )∫

∆i,k(P )f(v)dP (v).

Hence, employing that P (∆i,k(P )) = 2−i for all i ≥ 0 and 0 ≤ k ≤ 2i − 1 together

with ∆i(P ) being a partition of Ω(P ), and applying Holder’s inequality to the term

(f(v)− f(v))1v ∈ Ω(P )1v ∈ ∆i,k(P ) we can conclude that

EP [(f(V )− EP [f(V )|BP,i])2]

=2i−1∑k=0

∫∆i,k(P )

(f(v)− 2i∫

∆i,k(P )f(v)dP (v))2dP (v)

=2i−1∑k=0

22i

∫∆i,k(P )

(

∫∆i,k(P )

(f(v)− f(v))1v ∈ Ω(P )dP (v))2dP (v)

≤2i−1∑k=0

22iP (∆i,k(P ))

∫∆i,k(P )

∫∆i,k(P )

(f(v)− f(v))21v ∈ Ω(P )dP (v)dP (v)

=2i−1∑k=0

2i∫

∆i,k(P )

∫∆i,k(P )

(f(v)− f(v))21v ∈ Ω(P )dP (v)dP (v).

Let Di ≡ supP∈P max0≤k≤2i−1 diam∆i,k(P ), where diam∆i,k(P ) is the diameter

of ∆i,k(P ). Further note that by Lemma S.8.1(iv), Di = O(2−idv ) and hence we have

λ(s ∈ Rdv : ‖s‖ ≤ Di) ≤M12−i for some M1 > 0 and λ the Lebesgue measure. Noting

that supP∈P supv∈Ω(P )dPdλ (v) < ∞ by Assumption S.8.1(ii), and doing the change of

103

variables s = v − v we then obtain for some constant M0 > 0 that

EP [(f(V )− EP [f(V )|BP,i])2]

≤M0

2i−1∑k=0

2i∫

∆i,k(P )

∫∆i,k(P )

(f(v)− f(v))21v ∈ Ω(P )dλ(v)dλ(v)

≤M0M1 sup‖s‖≤Di

2i−1∑k=0

∫∆i,k(P )

(f(v + s)− f(v))21v + s ∈ Ω(P )dλ(v) . (S.332)

Hence, since ∆i,k(P ) : k = 0 . . . 2i − 1 is a partition of Ω(P ), $(f, h, P ) is decreasing

in h, and Di ≤ K12−idv for some K1 > 0 by Lemma S.8.1(iv), we obtain

EP [(f(V )− EP [f(V )|BP,i])2] ≤M0M1 ×$2(f,K1 × 2−idv , P ) (S.333)

by (S.332). Setting K0 ≡M0 ×M1 in (S.333) then establishes the lemma.

S.9 Uniform Bootstrap Coupling

We next provide uniform coupling results for the multiplier bootstrap that allow us to

verify Assumption 3.14 in a variety of problems. The results in this appendix may be of

independent interest, as they extend the validity of the multiplier bootstrap to suitable

non-Donsker classes Fn. For this reason, as in Section S.8, we state the results in a

notation that abstracts from the rest of the paper. Hence, here V ∈ Rdv should be

interpreted as a generic random variable whose distribution is given by P ∈ P.

Our coupling results rely on a series approximation to the elements of Fn. To this

end, we will assume that for each P ∈ P there is a basis fd,n,P dnd=1, with dn possibly

diverging to infinity, that provides a suitable approximation to every f ∈ Fn. Formally,

for fdnn,P (v) ≡ (f1,n,P (v), . . . , fdn,n,P (v))′, we impose the following:

Assumption S.9.1. For each P ∈ P there is an array of functions fd,n,P dnd=1 ⊂ L2P

such that: (i) The eigenvalues of EP [fdnn,P (V )fdnn,P (V )′] are bounded by 1 ≤ Cn uniformly

in P ∈ P; (ii) supP∈P max1≤d≤dn ‖fd,n,P ‖P,∞ ≤ Kn with 1 ≤ Kn finite.

Assumption S.9.2. For every f ∈ Fn and P ∈ P there is a βn,P (f) ∈ Rdn such that:

(i) The class Gn,P ≡ f − fdn′n,Pβn,P (f) : f ∈ Fn has envelope Gn,P which satisfies

‖g‖P,2 ≤ δn‖Gn,P ‖P,2 for all P ∈ P, g ∈ Gn,P , and some δn > 0 with

J1n ≡ supP∈PJ[ ](δn‖Gn,P ‖P,2,Gn,P , ‖ · ‖P,2) +

√nEP [Gn,P (V ) exp−


G2n,P (V )ηn,P

]

finite and ηn,P ≡ 1 + logN[ ](δn‖Gn,P ‖P,2,Gn,P , ‖ · ‖P,2); (ii) The set Bn ≡ βn,P (f) :

f ∈ Fn, P ∈ P ∪ 0 satisfies J2n ≡∫∞

0

√log(N(ε,Bn, ‖ · ‖2))dε <∞.

104

Assumption S.9.1 imposes our regularity conditions on the approximating functions

fd,n,P dnd=1. We emphasize that the functions fd,n,P dnd=1 need not be known as they are

merely employed in the theoretical construction of the bootstrap coupling, and not in

the computation of the multiplier bootstrap process Wn. In certain application, such as

when Fn itself is finite dimensional, a basis fd,n,P dnd=1 may be naturally available.The

approximating requirements of fd,n,P dnd=1 are formally imposed in Assumption S.9.2.

In particular, Assumption S.9.2(i) requires that the remainder of the approximation of

Fn by fd,n,P dnd=1 not be “too large.” Intuitively, Assumption S.9.2(i) may be under-

stood as controlling the “bias” in a series approximation of Fn by linear combinations of

fd,n,P dnd=1. Assumption S.9.2(ii) in turn controls the “variance” of the series approxi-

mation by demanding the class of approximating functions to have a finite entropy. As

in Section S.8, Assumption S.9.2 does not require the functions f ∈ Fn to be “smooth”

in the sense of being differentiable.

We next show Assumptions S.9.1 and S.9.2 suffice for coupling Wn to W?n,P .

Theorem S.9.1. Let Assumptions S.9.1, S.9.2 hold, (ωi, Vi)ni=1 be i.i.d. with Vi ∼P ∈ P, ωi ∼ N(0, 1), ωi and Vi independent, and suppose that dn log(1 + dn)K2

n = o(n).

(i) Then, there is a linear Gaussian process W?n,P independent of Vini=1 with

‖Wn −W?n,P ‖Fn = OP (J2n

K2nCndn log(1 + dn)

n1/4 + J1n)

uniformly in P ∈ P. (ii) If in addition the eigenvalues of EP [fdnn,P (V )fdnn,P (V )′] are

bounded away from zero uniformly in n and P ∈ P, then uniformly in P ∈ P

‖Wn −W?n,P ‖Fn = OP (

J2nKn

√Cndn log(1 + dn)√

n+ J1n).

Theorem S.9.1(i) derives a rate of convergence for the coupled process, while Theorem

S.9.1(ii) improves on the rate under the additional requirement that EP [fdnn,P (V )fdnn,P (V )′]

be bounded away from singularity. The rates of both Theorems S.9.1(i) and S.9.1(ii)

depend on the selected sequence dn, which should be chosen optimally to deliver the

best possible implied rate. Heuristically, the proof of Theorem S.9.1 proceeds in two

steps. First, we construct a multivariate normal random variable W?n,P (fdnn,P ) ∈ Rdn

that is coupled with Wn(fdnn,P ) ∈ Rdn , and then employ the linearity of Wn to obtain a

suitable coupling on the subspace Sn,P ≡ spanfdnn,P . Second, we employ Assumption

S.9.2(i) to show that a successful coupling on Sn,P leads to the desired construction since

Fn is well approximated by fd,n,P dnd=1. We note that while we do not pursue it here

for conciseness, the outlined heuristics can also be employed to verify Assumption 3.7

by coupling Gn,P (fdnn,P ) to Wn,P (fdnn,P ) through standard results (e.g. Yurinskii (1977)).

Below, we include the proof of Theorem S.9.1 and auxiliary results.

105

Proof of Theorem S.9.1: We proceed by employing Lemma S.9.1 to couple Wn on

a finite dimensional subspace, and then showing that such a result suffices for coupling

Wn and W?n,P on Fn. To this end, let Sn,P ≡ spanfdnn,P and note that Assumption

S.9.2(ii) and Lemma S.9.1 imply that there exists a linear Gaussian process W(1)n,P on

Sn,P and a sequence Rn = o(1) such that uniformly in P ∈ P we have

supβ∈Bn

|Wn(fdn′n,Pβ)−W(1)n,P (fdn′n,Pβ)| = OP (J2nRn). (S.334)

To establish part (i) of the theorem we will set Rn = (dn log(1 + dn)CnK2n/n)1/4

and employ Lemma S.9.1(i), while to establish part (ii) we will set Rn = (dn log(1 +

dn)CnK2n/n)1/2 and employ Lemma S.9.1(ii) instead.

For any closed linear subspace An,P of L2P , let Projf |An,P denote the ‖ · ‖P,2 pro-

jection of f onto An,P and set A⊥n,P ≡ f ∈ L2P : f = g−Projg|An,P for some g ∈ L2

P (i.e. A⊥n,P is the orthocomplement of An,P in L2

P ). Assuming the underlying probability

space is suitably enlarged to carry a linear isonormal process W(2)n,P on S⊥n,P independent

of W(1)n,P and Vini=1, we then define W?

n,P on L2P pointwise by

W?n,P f ≡W(1)

n,P (Projf |Sn,P ) + W(2)n,P (Projf |S⊥n,P ),

which is linear in f by linearity of f 7→ Projf |Sn,P , W(1)n,P , and W(2)

n,P . Moreover, since

W?n,P is sub-Gaussian with respect to ‖ · ‖P,2, it follows from Corollary 2.2.8 in van der

Vaart and Wellner (1996), N(δn‖Gn,P ‖P,2,Gn,P , ‖·‖P,2) = 1 due to ‖g‖P,2 ≤ δn‖Gn,P ‖P,2for all g ∈ Gn,P and P ∈ P, bracketing numbers being larger than covering numbers,

Jensen’s inequality, and the definition of J1n in Assumption S.9.2(i) that

EP [ supg∈Gn,P

|W?n,P g|] . δn‖Gn,P ‖P,2 +

∫ ∞0

√log(N(ε,Gn,P , ‖ · ‖P,2))dε

≤ δn‖Gn,P ‖P,2 +

∫ δn‖Gn,P ‖P,2

0

√1 + log(N[ ](ε,Gn,P , ‖ · ‖P,2))dε . J1n. (S.335)

To obtain an analogous bound for Wn, note supg∈Gn,P ‖g‖P,2 ≤ δn‖Gn,P ‖P,2 by Assump-

tion S.9.2(ii) and |EP [g(V )]| ≤ ‖g‖P,2 by Jensen’s inequality imply that

supg∈Gn,P

|Wng| ≤ supg∈Gn,P

| 1√n

n∑i=1

ωig(Vi)|

+ | 1√n

n∑i=1

ωi| × supg∈Gn,P

| 1n

n∑i=1

g(Vi)− EP [g(V )]|+ δn‖Gn,P ‖P,2. (S.336)

Next, define the class Gn,P ≡ (ω, v) 7→ ωg(v) : g ∈ Gn,P , and with some abuse of

notation let P index the joint distribution of (V, ω). Further note that if [gi,l,P , gi,u,P ]i

106

is a bracket for Gn,P , then the functions [gi,l,P , gi,u,P ] given by

gi,l,P ≡ (ω, v) 7→ maxω, 0gi,l,P (v) + minω, 0gi,u,P (v)

gi,u,P ≡ (ω, v) 7→ minω, 0gi,l,P (v) + maxω, 0gi,u,P (v)

form a bracket for Gn,P . Moreover, since E[ω2] = 1 and ω and V are independent,

it follows from direct calculation that ‖gi,u,P − gi,l,P ‖P,2 = ‖gi,u,P − gi,l,P ‖P,2. Setting

Gn,P (ω, v) ≡ |ω|Gn,P (v), then note that Gn,P is an envelope for Gn,P , which satisfies

‖Gn,P ‖P,2 = ‖Gn,P ‖P,2. Recalling ηn,P ≡ 1 + logN[ ](δn‖Gn,P ‖P,2,Gn,P , ‖ · ‖P,2) we then

obtain by Theorem 2.14.2 in van der Vaart and Wellner (1996) that

EP [ supg∈Gn,P

| 1√n

n∑i=1

ωig(Vi)|] . J[ ](δn‖Gn,P ‖P,2,Gn,P , ‖ · ‖P,2)

+√nEP [|ω|Gn,P (V )1|ω|

Gn,P (V )

‖Gn,P ‖P,2>

√nδn√ηn,P]. (S.337)

Moreover, since ω follows a standard normal distribution, we have E[|ω|1|ω| > a] .exp−a2/2 for any a ≥ 0. Therefore, the independence of ω and V implies

EP [|ω|Gn,P (V )1|ω|Gn,P (V )

‖Gn,P ‖P,2>

√nδn√ηn,P] . EP [Gn,P (V ) exp−


2G2n,P (V )ηn,P

]

which together with result (S.337) and the definition of J1n in Assumption S.9.2(i) yields

EP [ supg∈Gn,P

| 1√n

n∑i=1

ωig(Vi)|] . J1n. (S.338)

Moreover, by Lemma 2.3.1 in van der Vaart and Wellner (1996) we further obtain that

EP [ supg∈Gn,P

| 1√n

n∑i=1

g(Vi)− EP [g(V )]|] + δn‖Gn,P ‖P,2

≤ EP [ supg∈Gn,P

| 2√n

n∑i=1

ωig(Vi)|] + δn‖Gn,P ‖P,2 . J1n, (S.339)

where the final inequality follows from (S.338) and the definition of J1n. Thus, (S.336),

(S.338), and (S.339) together with Markov’s inequality imply that uniformly in P ∈ P

‖Wn‖Gn,P = OP (J1n). (S.340)

107

Next, we use the linearity of the processes f 7→ Wn(f) and f 7→W?n,P (f) to obtain that

‖Wn−W?n,P ‖Fn ≤ sup

f∈Fn|Wn(fdn′n,Pβn,P (f))−W?

n,P (fdn′n,P (βn,P (f)))|+‖Wn−W?n,P ‖Gn,P

≤ supβ∈Bn

|Wn(fdn′n,Pβ)−W?n,P (fdn′n,Pβ)|+OP (J1n) = OP (J2nRn + J1n),

where the second inequality holds uniformly in P ∈ P by (S.335) and Markov’s inequal-

ity, result (S.340), and set inclusion, while the equality holds uniformly in P ∈ P by

result (S.334). The first claim of the theorem then follows by using Lemma S.9.1(i) to

set Rn = (dn log(1 + dn)CnK2n/n)1/4 in (S.334), and the second part of the theorem

follows from using Lemma S.9.1(ii) to set Rn = (dn log(1 + dn)CnK2n/n)1/2 instead.

Lemma S.9.1. Let (ωi, Vi)ni=1 be i.i.d. with Vi ∼ P ∈ P, ωi ∼ N(0, 1), and ωi and Vi

independent. Suppose Assumption S.9.1 holds, dn log(1 + dn)K2n = o(n), and Bn ⊂ Rdn

satisfies 0 ∈ Bn and J2n ≡∫∞

0

√log(N(ε,Bn, ‖ · ‖2))dε <∞. (i) Then, there is a linear

Gaussian process W?n,P on Sn,P ≡ spanfdnn,P independent of Vini=1 with

supβ∈Bn

|Wn(fdn′n,Pβ)−W?n,P (fdn′n,Pβ)| = OP (J2n

dn log(1 + dn)CnK2n

n1/4)

uniformly in P ∈ P. (ii) If in addition the eigenvalues of EP [fdnn,P (V )fdnn,P (V )′] are

bounded away from zero uniformly in n and P ∈ P, then uniformly in P ∈ P

supβ∈Bn

|Wn(fdn′n,Pβ)−W?n,P (fdn′n,Pβ)| = OP (

J2n

√dn log(1 + dn)CnKn√

n).

Proof: First note that Wn(f−c) = Wnf for any c ∈ R and f ∈ L2P . We may therefore

assume without loss of generality that EP [fdnn,P (V )] = 0, and for every P ∈ P we let

Σn(P ) ≡ VarP fdnn,P (V ) = EP [fdnn,P (V )fdnn,P (V )′] and define

Σn(P ) ≡ 1

n

n∑i=1

(fdnn,P (Vi)−1

n

n∑j=1

fdnn,P (Vj))(fdnn,P (Vi)−

1

n

n∑j=1

fdnn,P (Vj))′.

For a sequence Rn with Rn = o(1), and any constant M > 0 and P ∈ P define the event

An,P (M) ≡ ‖Σ1/2n (P )− Σ1/2

n (P )‖o,2 ≤MRn. (S.341)

Further note that by Lemma S.9.2 it follows we may select Rn = o(1) such that we have

lim infM↑∞

lim infn→∞

infP∈P

P (Vini=1 ∈ An,P (M)) = 1. (S.342)

In particular, to establish part (i) of the lemma we will setRn = (dn log(1+dn)CnK2n/n)1/4

and employ Lemma S.9.2(i), while to establish part (ii) we will set Rn = (dn log(1 +

108

dn)CnK2n/n)1/2 and employ Lemma S.9.2(ii) instead.

Next, let Ndn ∈ Rdn follow a standard normal distribution and be independent

of (ωi, Vi)ni=1 (defined on the same suitably enlarged probability space). Further let

νddnd=1 denote eigenvectors of Σn(P ), λddnd=1 represent the corresponding (possibly

zero) eigenvalues and define the random variable Zn,P ∈ Rdn to be given by

Zn,P ≡∑d:λd 6=0

νd(ν ′dWn(fdnn,P ))

λ1/2d

+∑d:λd=0

νd(ν′dNdn). (S.343)

Then note that since Wn(fdnn,P ) ∼ N(0, Σn(P )) conditional on Vini=1, and Ndn is inde-

pendent of (ωi, Vi)ni=1, Zn,P is Gaussian conditional on Vini=1. Furthermore,

E[Zn,PZ′n,P |Vini=1] =

dn∑d=1

νdν′d = Idn

by direct calculation for Idn the dn × dn identity matrix. Hence, Zn,P ∼ N(0, Idn)

conditional on Vini=1 almost surely in Vini=1 and is thus independent of Vini=1.

Moreover, we also note that by Theorem 3.6.1 in Bogachev (1998) and Wn(fdnn,P ) ∼N(0, Σn(P )) conditional on Vini=1, it follows that Wn(fdnn,P ) belongs to the range of

Σn(P ) : Rdn → Rdn almost surely in (ωi, Vi)ni=1. Therefore, since νd : λd 6= 0dnd=1

spans the range of Σn(P ), we conclude from (S.343) that for any β ∈ Rdn

β′Σ1/2n (P )Zn,P = β′

∑d:λd 6=0

νd(ν′dWn(fdnn,P )) = Wn(β′fdnn,P ).

Analogously, we define for any β ∈ Rdn the linear Gaussian process W?n,P on Sn,P by

W?n,P (β′fdnn,P ) ≡ β′Σ1/2

n (P )Zn,P ,

which is independent of Vini=1 due to the independence of Zn,P and Vini=1. Setting

Wn,P (β) ≡ |β′(Σ1/2n (P )− Σ1/2

n (P ))Zn,P |1An,P (M), (S.344)

then note that for 1An,P (M) the indicator for the event Vini=1 ∈ An,P (M), we have

supβ∈Bn

|Wn(fdn′n,Pβ)−W?n,P (fdn′n,Pβ)|1An,P (M)

= supβ∈Bn

|β′(Σ1/2n (P )− Σ1/2

n (P ))Zn,P |1An,P (M) = supβ∈Bn

|Wn,P (β)|. (S.345)

Since Bn is bounded due to J2n being finite, definition (S.344) implies Wn,P ∈ `∞(Bn).

Moreover, we note that conditional on Vini=1, Wn,P is sub-Gaussian under the semi-

metric ρn(β, β) ≡ ‖(Σ1/2n (P )−Σ

1/2n (P ))(β−β)‖2. Since ‖Σ1/2

n (P )−Σ1/2n (P )‖o,2 ≤MRn

109

whenever 1An,P (M) = 1, we additionally obtain that∫ ∞0

√log(N(ε,Bn, ρn))dε ≤

∫ ∞0

√log(N(ε/MRn,Bn, ‖ · ‖2))dε

= MRn

∫ ∞0

√log(N(u,Bn, ‖ · ‖2))du, (S.346)

where the equality follows from the change of variables ε = MRnu. Therefore, since

0 ∈ Bn, Corollary 2.2.8 in van der Vaart and Wellner (1996) and (S.346) imply

E[ supβ∈Bn

|Wn,P (β)||Vini=1] .MRn

∫ ∞0

√log(N(u,Bn, ‖ · ‖2))du ≡MRnJ2n. (S.347)

Next, note (S.344), (S.345), and (S.347) together with Markov’s inequality imply that

P ( supβ∈Bn

|Wn(fdn′n,Pβ)−W?n,P (fdn′n,Pβ)| > M2RnJ2n; An,P (M))

≤ P ( supβ∈Bn

|Wn,P (β)| > M2RnJ2n) .1

M(S.348)

for all P ∈ P. Therefore, combining results (S.342) and (S.348), we can finally conclude

lim supM↑∞

lim supn→∞

supP∈P

P ( supβ∈Bn

|Wn(fdn′n,Pβ)−W?n(fdn′n,Pβ)| > M2RnJ2n)

. lim supM↑∞

lim supn→∞

supP∈P 1

M+ P (Vini=1 /∈ An,P (M)) = 0.

The first claim of the lemma then follows by employing Lemma S.9.2(i) to set Rn =

(dn log(1 + dn)CnK2n/n)1/4 in (S.341), while the second claim follows by employing

Lemma S.9.2(ii) to set Rn = (dn log(1 + dn)CnK2n/n)1/2.

Lemma S.9.2. Let Vini=1 be i.i.d. with Vi ∼ P ∈ P, suppose Assumption S.9.1 holds,

define Σn(P ) ≡ VarP fdnn,P (V ) and its sample analogue Σn(P ) to equal

Σn(P ) ≡ 1

n

n∑i=1

(fdnn,P (Vi)−1

n

n∑j=1

fdnn,P (Vj))(fdnn,P (Vi)−

1

n

n∑j=1

fdnn,P (Vj))′,

and assume dn log(1 + dn)K2nCn = o(n). (i) Then, it follows that uniformly in P ∈ P

‖Σ1/2n (P )− Σ1/2

n (P )‖o,2 = OP (dn log(1 + dn)CnK2n

n1/4).

(ii) If in addition supP∈P ‖Σ−1n (P )‖o,2 is bounded in n, then uniformly in P ∈ P

‖Σ1/2n (P )− Σ1/2

n (P )‖o,2 = OP (


n).

110

Proof: First note Assumption S.9.1(i) implies that for all P ∈ P we must have

‖ 1

nfdnn,P (Vi)f

dnn,P (Vi)

′ − EP [fdnn,P (V )fdnn,P (V )′]‖o,2 ≤2dnK

2n

n(S.349)

almost surely for all P ∈ P since each entry of the matrix fdnn,P (Vi)fdnn,P (Vi)

′ is bounded

by K2n. Similarly, employing ‖fdnn,P (Vi)f

dnn,P (Vi)

′‖o,2 ≤ dnK2n almost surely we obtain

‖ 1

nEP [fdnn,P (Vi)f

dnn,P (Vi)

′ − EP [fdnn,P (V )fdnn,P (V )′]2]‖o,2 ≤2dnK

2nCnn

. (S.350)

Thus, employing results (S.349) and (S.350), together with dn log(1 + dn)K2nCn = o(n),

we obtain by Theorem 6.1(ii) in Tropp (2012) (Bernstein’s inequality for matrices) that

supP∈P

P (‖ 1

n

n∑i=1

fdnn,P (Vi)fdnn,P (Vi)

′−EP [fdnn,P (V )fdnn,P (V )′]‖o,2 >M


n)

≤ dn exp−M2(dn log(1 + dn)K2

nCn)

2n

n

MBdnK2nCn (S.351)

for some B <∞. In particular, we can conclude from (S.351) that uniformly in P ∈ P

‖ 1

n

n∑i=1

fdnn,P (Vi)fdnn,P (Vi)

′ − EP [fdnn,P (V )fdnn,P (V )′]‖o,2 = OP (


n).

(S.352)

Next note that EP [f2n,d,P (V )] ≤ ‖EP [fdnn,P (V )fdnn,P (V )′]‖o ≤ Cn, Markov’s inequality, and

Lemmas 2.2.9 and 2.2.10 in van der Vaart and Wellner (1996) imply that

‖ 1

n

n∑i=1

fdnn,P (Vi)− EP [fdnn,P (Vi)]‖2 ≤√dn max

1≤d≤dn| 1n

n∑i=1

fd,n,P (Vi)− EP [fd,n,P (V )]|

= OP (Kn log(1 + dn)

√dn

n+

√Cndn log(1 + dn)√

n) (S.353)

uniformly in P ∈ P. Moreover, ‖EP [fdnn,P (V )fdnn,P (V )′]‖o ≤ Cn, ‖b‖2 = sup‖a‖2=1 |a′b| for

any b ∈ Rdn , Assumption S.9.1(ii), and Jensen’s inequality together imply that

‖EP [fdnn,P (V )]‖2 = sup‖a‖=1

|a′E[fdnn,P (V )]| ≤ sup‖a‖=1

EP [(a′fdnn,P (V ))2]1/2 ≤√Cn ∧Kn

(S.354)

uniformly in P ∈ P. Therefore, since by the triangle inequality we additionally have

‖( 1

n

n∑i=1

fdnn,P (Vi))(1

n

n∑i=1

fdnn,P (Vi))′ − EP [fdnn,P (V )]EP [fdnn,P (V )]′‖o,2

≤ ‖EP [fdnn,P (V )]‖2 + ‖ 1

n

n∑i=1

fdnn,P (Vi)‖2‖1

n

n∑i=1

fdnn,P (Vi)− EP [fdnn,P (V )]‖2,

111

it follows from results (S.352), (S.353), and (S.354) that uniformly in P ∈ P we have

‖Σn(P )− Σn(P )‖o,2 = OP (


n). (S.355)

Finally, since Σn(P ) ≥ 0 and Σn(P ) ≥ 0, Theorem X.1.1 in Bhatia (1997) implies that

‖Σ1/2n (P )− Σ1/2

n (P )‖o,2 ≤ ‖Σn(P )− Σn(P )‖1/2o,2 (S.356)

almost surely, and hence the first claim the lemma follows from (S.355) and (S.356).

For the second claim of the lemma, let eigA denote the smallest eigenvalue of

any Hermitian matrix A. Since ‖Σ−1n (P )‖o,2 = eigΣn(P ), supP∈P ‖Σ−1

n (P )‖o,2 being

bounded uniformly in n, result (S.355), and Corollary III.2.6 in Bhatia (1997) imply

lim infn→∞

infP∈P

P (mineigΣn(P ), eigΣn(P ) > η0) = 1

for some η0 > 0. Applying Theorem X.3.8 in Bhatia (1997) we can then conclude that

lim supM↑∞

lim supn→∞

supP∈P

P (‖Σ1/2n (P )− Σ1/2

n (P )‖o,2 > M


n)

≤ lim supM↑∞

lim supn→∞

supP∈P

P (‖Σn(P )− Σn(P )‖o,2 > M2√η0dn log(1 + dn)CnKn√

n) = 0,

where the final equality follows from result (S.355).

Lemma S.9.3. Let An ≡ βj∞j=jn : βj ≤ Mn/jγ for some jn ≥ 2, Mn > 0, and

γ > 1, and define the metric ‖β‖`1 ≡∑

j≥jn |βj |. For any ε > 0 it then follows that

logN(ε,An, ‖ · ‖`1) ≤ ( 2Mn

ε(γ − 1))

1γ−1 + 1− jn ∨ 0 × log(

2Mn(jn − 1)−(γ−1)

ε(γ − 1)+ 1).

Proof: For any βj ∈ An and integer k ≥ (jn − 1) ≥ 1 we first obtain the estimate

∞∑j=k+1

|βj | ≤∞∑

j=k+1

Mn

jγ≤Mn

∫ ∞k

u−γdu = Mnk−(γ−1)

(γ − 1). (S.357)

For any a ∈ R, let dae denote the smallest integer larger than a, and further define

j?n(ε) ≡ d( 2Mn

ε(γ − 1))

1γ−1 e ∨ (jn − 1). (S.358)

Then note that (S.357) implies that for any βj ∈ An we have∑

j>j?(ε) |βj | ≤ ε/2.

112

Hence, letting An(ε) ≡ βj∞j=jn ∈ An : βj = 0 for all j > j?n(ε), we obtain

N(ε,An, ‖ · ‖`1) ≤ N(ε/2,An(ε), ‖ · ‖`1) ≤j?(ε)∏j=jn

N(cjε/2, [−Mn

jγ,Mn

jγ], | · |) (S.359)

where the final inequality holds for any cjj?(ε)j=jn

satisfying cj > 0 and∑j?(ε)

j=jncj = 1, and

the product should be understood to equal one if j?(ε) = jn − 1. In particular, setting

cj = j−γ/(∑j?(ε)

j=jnj−γ) in (S.359) we obtain the upper bound

N(ε,An, ‖ · ‖`1) ≤ (d2Mn(

∑∞j=jn

j−γ)

εe)(j?n(ε)−jn)∨0

≤ (d2Mn(jn − 1)−(γ−1)

ε(γ − 1)e)(j?n(ε)−jn)∨0

where the final inequality follows from an integral bound as in (S.357). Thus, the claim

of the lemma follows from the bound dae ≤ a+ 1 and results (S.358) and (S.359).

Lemma S.9.4. For any positive random variable U with E[U2] <∞ and finite constant

A > 0 it follows that E[U exp− AU2 ] ≤ E[U ] exp− A

E[U2]+ E[U2]/

√2A.

Proof: First note that the function u 7→ u exp−A/u2 is convex on u ∈ (0,√

2A].

Therefore by applying Jensen’s inequality, the fact that the function u 7→ u exp−A2/u2is increasing in u ∈ (0,∞), E[10 < U <

√2AU ] ≤ E[U ] due to U being positive almost

surely, and exp−A/U2 ≤ 1 due to A > 0, we can conclude that

E[U exp− A

U2] = E[10 < U ≤

√2AU exp− A

U2] + E[1U >

√2AU exp− A

U2]

≤ E[U ] exp− A

E[U2]+ E[1U >

√2AU ].

The claim of the lemma therefore follows from E[1U >√

2AU ] ≤ E[U2]/√

2A by the

Cauchy Schwarz inequality and Markov’s inequality.

References

Aliprantis, C. D. and Border, K. C. (2006). Infinite Dimensional Analysis – A Hitchhiker’sGuide. Springer-Verlag, Berlin.

Belloni, A., Chernozhukov, V., Chetverikov, D. and Kato, K. (2015). Some new asymp-totic theory for least squares series: Pointwise and uniform results. Journal of Econometrics,186 345–366.

Bhatia, R. (1997). Matrix Analysis. Springer, New York.

Blundell, R., Chen, X. and Kristensen, D. (2007). Semi-nonparametric iv estimation ofshape-invariant engel curves. Econometrica, 75 1613–1669.

113

Blundell, R., Horowitz, J. and Parey, M. (2017). Nonparametric estimation of a nonsep-arable demand function under the slutsky inequality restriction. Review of Economics andStatistics, 99 291–304.

Blundell, R., Horowitz, J. L. and Parey, M. (2012). Measuring the price responsivenessof gasoline demand: Economic shapre restrictions and nonparametric demand estimation.Quantitative Economics, 3 29–51.

Bogachev, V. I. (1998). Gaussian Measures. American Mathematical Society, Providence.

Bogachev, V. I. (2007). Measure Theory: Volume I. Springer-Verlag, Berlin.

Chen, X. (2007). Large sample sieve estimation of semi-nonparametric models. In Handbook ofEconometrics 6B (J. J. Heckman and E. E. Leamer, eds.). North Holland, Elsevier.

Chen, X. and Christensen, T. M. (2018). Optimal sup-norm rates and uniform inference onnonlinear functionals of nonparametric iv regression. Quantitative Economics, 9 39–84.

Chen, X. and Pouzo, D. (2012). Estimation of nonparametric conditional moment modelswith possibly nonsmooth generalized residuals. Econometrica, 80 277–321.

DeVore, R. A. and Lorentz, G. G. (1993). Constructive approximation, vol. 303. SpringerScience & Business Media.

Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators.Econometrica, 50 891–916.

Koltchinskii, V. I. (1994). Komlos-major-tusnady approximation for the general empiricalprocess and haar expansions of classes of functions. Journal of Theoretical Probability, 773–118.

Kress, R. (1999). Linear Integral Equations. Springer, New York.

Luenberger, D. G. (1969). Optimization by Vector Space Methods. Wiley, New York.

Massart, P. (1989). Strong approximation for multivariate empirical and related processes,via kmt constructions. The Annals of Probability, 17 266–291.

Matzkin, R. L. (1994). Restrictions of economic theory in nonparametric methods. In Handboodof Econometrics (R. Engle and D. McFadden, eds.), vol. IV. Elsevier.

Newey, W. K. (1997). Convergence rates and asymptotic normality for series estimators.Journal of Econometrics, 79 147–168.

Pollard, D. (2002). A user’s guide to measure theoretic probability, vol. 8. Cambridge Uni-versity Press.

Tropp, J. A. (2012). User-friendly tail bounds for sums of random matrices. Foundations ofComputational Mathematics, 4 389–434.

van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and EmpiricalProcesses: with Applications to Statistics. Springer, New York.

Walkup, D. W. and Wets, R. J.-B. (1969). A lipschitzian characterization of convex poly-hedra. Proceedings of the American Mathematical Society 167–173.

Yurinskii, V. V. (1977). On the error of the gaussian approximation for convolutions. Theoryof Probability and Its Applications, 2 236–247.

Zeidler, E. (1985). Nonlinear Functional Analysis and its Applications I. Springer-Verlag,Berlin.

Zhai, A. (2018). A high-dimensional clt in w2 distance with near optimal convergence rate.Probability Theory and Related Fields, 170 821–845.

114

Constrained Conditional Moment Restriction Models · Andres Santosz U.C.L.A. [email protected] First Draft: September, 2015 This Draft: July, 2020 Abstract Shape restrictions have

Documents