Constrained Conditional Moment Restriction Models * Victor Chernozhukov M.I.T. [email protected]Whitney K. Newey † M.I.T. [email protected]Andres Santos ‡ U.C.L.A. [email protected]First Draft: September, 2015 This Draft: July, 2020 Abstract Shape restrictions have played a central role in economics as both testable impli- cations of classical theory and sufficient conditions for obtaining informative coun- terfactual predictions. In this paper we provide a general procedure for inference under shape restrictions in identified and partially identified models defined by con- ditional moment restrictions. Our test statistics and proposed inference methods are based on the minimum of the generalized method of moments (GMM) objective function with and without shape restrictions. The critical values are obtained by building a strong approximation to the statistic and then bootstrapping a conser- vatively relaxed form of the statistic. Sufficient conditions are provided, including strong approximation using Koltchinski’s coupling. Examples given in the paper include inference for linear instrumental variable (IV) models with linear inequality constraints, inference for the variability of quantile IV treatment effects, and infer- ence for bounds on average equivalent variation in a demand model with general heterogeneity. We find in Monte Carlo examples that the critical values are conser- vatively accurate and that tests about objects of interest have good power relative to unrestricted GMM. We also give an empirical application to estimating returns to schooling when marginal returns are positive and declining in schooling. Keywords: Shape restrictions, inference on functionals, conditional moment (in)equality restrictions, instrumental variables, nonparametric and semiparametric models, Banach space, Banach lattice, Koltchinskii coupling. * We thank Riccardo D’amato for excellent research assistance. We are also indebted to the editor, three anonymous referees, and numerous seminar participants for their valuable comments. † Research supported by NSF Grant 1757140. ‡ Research supported by NSF Grant SES-1426882.
154
Embed
Constrained Conditional Moment Restriction Models · Andres Santosz U.C.L.A. [email protected] First Draft: September, 2015 This Draft: July, 2020 Abstract Shape restrictions have
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Constrained Conditional Moment Restriction Models∗
∗We thank Riccardo D’amato for excellent research assistance. We are also indebted to the editor,
three anonymous referees, and numerous seminar participants for their valuable comments.†Research supported by NSF Grant 1757140.‡Research supported by NSF Grant SES-1426882.
1 Introduction
Shape restrictions have played a central role in economics as both testable implications
of classical theory and sufficient conditions for obtaining informative counterfactual pre-
dictions (Topkis, 1998). A long tradition in applied and theoretical econometrics has as
a result studied shape restrictions, their ability to aid in identification, estimation, and
inference, and the possibility of testing for their validity (Matzkin, 1994; Chetverikov
et al., 2018). The canonical example of this interplay between theory and practice is
undoubtedly consumer demand analysis, where theoretical predictions such as Slutsky
conditions have been extensively tested for and employed in estimation (Hausman and
Newey, 1995, 2016; Blundell et al., 2012; Dette et al., 2016). The empirical analysis
of shape restrictions, however, goes well beyond this important application with recent
examples including studies into the monotonicity of the state price density (Jackwerth,
2000; Aıt-Sahalia and Duarte, 2003), the presence of ramp-up and start-up costs (Wolak,
2007; Reguant, 2014), and the existence of complementarities in demand (Gentzkow,
2007) and organizational design (Athey and Stern, 1998; Kretschmer et al., 2012).
In this paper, we provide a general procedure for inference under shape restrictions
in conditional moment restriction models in which the conditioning variables (i.e. in-
struments) may vary with equations. As shown by Ai and Chen (2007, 2012), the
models we study encompass parametric (Hansen, 1982), semiparametric (Ai and Chen,
2003), and nonparametric (Newey and Powell, 2003) specifications, as well as panel data
applications (Chamberlain, 1992) and the study of plug-in functionals. By incorporat-
ing nuisance parameters into the definition of the parameter space, these models also
encompass semi(non)-parametric conditional moment (in)equality models.
Shape restrictions are often equivalent to inequalities on parameters of interest and of
certain unknown functions. For example, Slutsky negative semi-definiteness and mono-
tonicity require that certain functions satisfy inequality restrictions. Inference with
inequality restrictions is difficult. Such restrictions lead to discontinuities in limiting
distributions when the inequality restrictions bind, which makes inference challenging
due to non-pivotal and potentially unreliable pointwise asymptotic approximations (An-
drews, 2000, 2001). We address these challenges by carefully building some constraint
slackness into a potentially regularized specification of the local parameter space that
accounts for the curvature present in nonlinear constraints.
Our test statistics and proposed inference methods are based on the minimums
of the generalized method of moments (GMM) objective function with and without
inequality restrictions. We obtain a strong approximation to our test statistics and
propose bootstrap based critical values. The resulting tests remain valid in partially
identified settings and are shown to asymptotically control size uniformly over a class of
underlying distributions of the data. Inference is potentially conservative but powerful
1
in exploiting the large amount of information that inequality restrictions can provide
in many cases relevant for applications. While aspects of our analysis are specific to
the conditional moment restriction model, the role of the local parameter space is solely
dictated by the shape restrictions. As such, we expect the insights of the set up here to
be applicable to the study of shape restrictions in alternative models as well.
The inequalities associated with nonparametric shape restrictions necessitate con-
sideration of parameter spaces that are sufficiently general yet endowed with enough
structure to ensure a fruitful asymptotic analysis. An important insight of this paper is
that this simultaneous flexibility and structure is possessed by sets defined by inequality
restrictions on Abstract M (AM) spaces, an AM space being a Banach lattice whose
norm obeys a condition discussed in Section 3. We illustrate the general applicability
of our framework by applying our main results to: (i) Conduct inference on parameters
of interest estimated by GMM in the presence of parametric inequality restrictions; (ii)
Test shape restrictions on structural functions satisfying quantile conditional moment
restrictions; (iii) Conduct inference about partially identified sets of average equivalent
variation and other objects of interest in demand estimation with general heterogeneity
and smooth demand functions; and (iv) Impose the Slutsky restrictions to conduct in-
ference in a linear conditional moment restriction model. Additionally, while we do not
pursue further examples in detail for conciseness, we note our results may be applied to
conduct tests of homogeneity, supermodularity, and economies of scale or scope, as well
as inference on functionals of the identified set in certain partially identified models.
The literature on nonparametric shape restrictions in econometrics has classically
focused on testing whether conditional mean regressions satisfy the restrictions implied
by consumer demand theory; see, e.g., Lewbel (1995) and Haag et al. (2009), as well
as Lee et al. (2018) for a generalization to estimators allowing for Bahadur representa-
tions. The related problem of studying monotone conditional mean regressions has also
garnered widespread attention – recent advances on this problem includes Chetverikov
(2019) and Chatterjee et al. (2013). Additional work concerning monotonicity con-
straints includes Beare and Schmidt (2014) who test the monotonicity of the pricing
kernel, Chetverikov and Wilhelm (2017) who study estimation of a nonparametric in-
strumental variable (IV) regression under monotonicity constraints, Armstrong (2015)
who develops minimax rate optimal one sided tests in a Gaussian regression discon-
tinuity (RD) design, Babii and Kumar (2019) who apply isotonic regression to RD,
and Freyberger and Horowitz (2015) who study a partially identified nonparametric IV
model with discrete regressors. Our results do not lend themselves computationally for
the construction of uniform confidence bands for shape restricted functions – a problem
that has been addressed in different contexts by Chernozhukov et al. (2009), Horowitz
and Lee (2017), and Freyberger and Reeves (2018). Following the original version of
this paper, Zhu (2019) and Fang and Seo (2019) have proposed inference methods for
2
convex restrictions which, while applicable to an important class of problems, rule out
inference on nonlinear functionals or tests of certain shape restrictions. The analysis in
our paper is also related to the moment inequalities literature (Canay and Shaikh, 2017;
Ho and Rosen, 2017). When specialized to these models, our results enable for subvec-
tor inference under conditions related to those in Bugni et al. (2017); see Torgovitsky
(2019) for an application of our procedure in this context. Our paper also contributes
to a literature studying semiparametric and nonparametric models under partial iden-
tification (Manski, 2003). Examples of such work include Chen et al. (2011a), Hong
(2017), Santos (2012), Tao (2014), and Chen et al. (2011b) who focus on inference on
functionals of the structural functional but do not allow for shape constraints as we do.
This paper innovates relative to the previous literature in providing novel bootstrap
based, uniformly valid inference methods for general conditional moment restriction
models, including parametric and nonparametric IV models as special cases. Most of the
previous results are either for conditional means or the inference results are not uniformly
valid and are thus unsuitable for confidence interval construction. We also innovate in
providing inference results for functionals of the structural function while allowing for
shape constraints. The results of Chernozhukov et al. (2009) are complementary to our
analysis since our results do not provide feasible uniform confidence intervals but do lead
to, e.g., tests for monotonicity and for inference at a point whereas their methods would
be conservative. The results here are highly complementary to Chetverikov and Wilhelm
(2017) in providing inference for nonparametric IV under shape restrictions while they
showed that imposing monotonicity can greatly improve the convergence rate of the
estimator – an observation that additionally motivates our use of test statistics based
on shape constrained (instead of unconstrained) estimators.
The remainder of the paper is organized as follows. In Section 2 we preview our test
in the context of two specific examples. Section 3 contains our main theoretical results,
while Section 4 applies them to conduct inference in the heterogenous demand model
of Hausman and Newey (2016). Finally, Section 5 contains a brief simulation study,
while Section 6 revisits Angrist and Krueger (1991) study into the returns of education
by imposing shape restrictions. All mathematical derivations are included in a series of
appendices; see in particular Appendix S.6 for applications of our general results and
Appendix S.8 for coupling results based on Koltchinskii (1994).
2 Guide to Implementation
To fix ideas, we first describe our test in the context of two specific examples. We reserve
until later the full mathematical framework and focus here on implementation.
3
2.1 Linear Instrumental Variables
As perhaps the simplest possible example, we first consider a linear instrumental variable
model in which θ0(P ) ∈ Θ is identified through the moment conditions
EP [(Y −W ′θ0(P ))Z] = 0,
where Y ∈ R, W ∈ Rdθ , Z ∈ Rdz , and P denotes the distribution of V ≡ (Y,W,Z).
We are interested in testing whether θ0(P ) belongs to a set R characterized by
R = θ ∈ Rdθ : Fθ = b, Cθ ≤ c, (1)
for F a db × dθ matrix, C a dc × dθ matrix, b ∈ Rdb , and c ∈ Rdc .
We consider tests based on minimizing the norm of the weighted sample moments
as in Sargan (1958) and Hansen (1982). To this end, we define the criterion
Qn(θ) ≡ ( 1
n
n∑i=1
(Yi −W ′iθZi))′Σ′nΣn(1
n
n∑i=1
(Yi −W ′iθ)Zi)1/2,
where Σn is consistent for (E[ZZ ′U2])−12 for U ≡ Y −W ′θ0(P ) – e.g. Σ′nΣn may be the
optimal weighting matrix of Hansen (1982). Our test statistic is then based on
In(R) = minθ∈R
√nQn(θ) In(Θ) = min
θ∈Rdθ
√nQn(θ).
Specifically, we consider tests that reject for large values of In(R)−In(Θ). In what follows
it will also be helpful to let θn and θun denote the minimizers ofQn over θ ∈ R and θ ∈ Rdθ
respectively – i.e. θn and θun are the constrained and unconstrained estimators.
We construct a critical value for this choice of test statistic by using the multiplier
bootstrap. Specifically, let b ∈ 1, . . . , B index a bootstrap draw, ωbini=1 be i.i.d.
independent of the data with ωbi ∼ N(0, 1), and for any θ ∈ Rdθ define
Wbn(θ) ≡ 1√
n
n∑i=1
ωbi(Yi −W ′iθ)Zi −1
n
n∑j=1
(Yj −W ′jθ)Zj,
which is a simulated draw of the true (centered) moment functions. We also require an
estimator of the derivative of the moment conditions, and to this end we set
Dn[h] ≡ − 1
n
n∑i=1
ZiW′ih
for any h ∈ Rdθ . Finally, we additionally need to account for the variation in θn (recall
In(R) =√nQn(θn)). The principal challenge in this regard is accounting for the possible
4
values that θn may take, and for this purpose we introduce the set
Vn(θn) ≡ h ∈ Rdθ : Fh = 0 and Cjh ≤√nmax0,−(rn + Cj θn − cj) for all j,
where Cj is the jth row of C, cj the jth coordinate of c, and rn is a slackness parameter
whose choice we discuss shortly. The set Vn(θn) can be thought of as a local version of
the restricted parameter space, approximating the set of values h/√n that could equal
θn − θ0(P ). Our bootstrap approximations to In(R) and In(Θ) are then given by
U bn(R) ≡ minh∈Vn(θn)
(Wbn(θn) + Dnh)′Σ′nΣn(Wb
n(θn) + Dnh)1/2 (2)
U bn(Θ) ≡ minh∈Rdθ
(Wbn(θu
n) + Dnh)′Σ′nΣn(Wbn(θu
n) + Dnh)1/2. (3)
Thus, for a level α test, we employ the 1 − α quantile of U bn(R) − U bn(Θ) across the B
bootstrap draws as our critical value. The asymptotic validity of this test is formally
established in Appendix S.6.1, where we specialize our results to parametric GMM.
The critical value depends on the choice rn. Our asymptotic theory will require
rn√n → ∞, with the choice rn = +∞ always being valid. Intuitively, rn is meant to
reflect the sampling uncertainty in Cθn − θ0(P ). Since the distribution of θn cannot
be uniformly consistently estimated, we suggest linking rn to the degree of sampling
uncertainty in Cθun − θ0(P ) instead. Specifically, we recommend setting rn to satisfy
P ( max1≤j≤J
Cjθun − θu?
n ≤ rn|Data) = 1− γn (4)
where γn → 0 and θu?n is a “bootstrap” analogue of θu
n. This approach changes the prob-
lem of selecting rn into the problem of selecting γn. However, γn is more interpretable:
If we employed Vn(θun) in place of Vn(θn) in (2), then a Bonferroni bound implies the
asymptotic size is bounded by α + γn for fixed γn.1 In simulations we find this bound
to be overly pessimistic and the test to have size at most α for a wide range of γn.
Remark 2.1. Our test may be employed to obtain confidence regions for a coordinate
of θ0(P ) while imposing restrictions of the form Cθ0(P ) ≤ c (e.g. sign restrictions or
monotonicity of x 7→ x′θ0(P )). Specifically, for θk the kth coordinate of θ ∈ Rdθ let
Rλ ≡ θ ∈ Rdθ : θk = λ, Cθ ≤ c,
which is a special case of (1). The desired confidence region can then be obtained by
conducting test inversion in λ employing the described test.
1While we may replace Vn(θn) with Vn(θun) in identified models, in partially identified models weemploy Vn(θn) due to the identified set potentially not being a subset of R under the null hypothesis.
5
2.2 Quantile Treatment Effects
As a second introductory example we consider a nonparametric conditional moment
restriction model. Specifically, for an outcome Y ∈ R, treatment D ∈ [0, 1], instrument
Z ∈ R, and quantile τ ∈ (0, 1), we are interested in the function θ0(P ) solving
P (Y ≤ θ0(P )(D)|Z) = τ, (5)
where θ0(P )(D) denotes the value of θ0(P ) at the point D and P the distribution of
V ≡ (Y,D,Z). If D is randomly assigned, then we may set D = Z and interpret the
derivative ∇θ0(P ) as the τ th quantile treatment effect (QTE). Alternatively, if D 6= Z,
then we obtain the IV model of QTE of Chernozhukov and Hansen (2005). In what
follows we assume θ0(P ) belongs to a parameter space Θ, such as the set of functions
with bounded first and second derivatives; see Appendix S.6.3 for a formal analysis.
We aim to obtain a confidence interval on the variation (in D) of the QTE while
imposing that the QTE be increasing in treatment intensity.2 To this end, we set
R ≡ θ ∈ Θ :
∫ 1
0(∇θ(u))2du− (
∫ 1
0∇θ(u)du)2 = λ and ∇2θ(u) ≥ 0 for all u ∈ [0, 1],
where λ is the hypothesized value for the variation of the QTE, and the positivity
constraint on ∇2θ ensures the QTE is increasing in treatment intensity. By conducting
test inversion (in λ) of the null hypothesis that θ0(P ) ∈ R we may then obtain the desired
confidence region. To construct a test statistic we employ a sequence of transformations
qkknk=1 of Z, let qkn(z) ≡ (q1(z), . . . , qkn(z))′, and define the criterion function
Qn(θ) ≡ ( 1
n
n∑i=1
(1Yi ≤ θ(Di) − τ)qknn (Zi))′(
1
n
n∑i=1
(1Yi ≤ θ(Di) − τ)qknn (Zi))1/2,
where we allow kn to increase with n to reflect the full content of the conditional moment
restriction in (5). Because Θ is infinite dimensional, we do not minimize Qn over Θ,
but instead employ a sieve Θn. Specifically, for pjjnj=1 a sequence of approximating
functions, pjn(d) ≡ (p1(d), . . . , pjn(d))′, and Θn ≡ pjn′β : pjn′β ∈ Θ we define
In(R) ≡ infθ∈Θn∩R
√nQn(θ) In(Θ) ≡ inf
θ∈Θn
√nQn(θ),
and let θn and θun denote the corresponding constrained and unconstrained estimators.
The statistics In(R) and In(R) − In(Θ) can both be employed as the basis for a test.
As we discuss in Section 3, however, our asymptotic approximation to In(R) − In(Θ)
imposes more stringent assumptions on the rate of convergence of θun than those needed to
2Inference on other functionals, such as the average (in D) QTE, is also feasible but for illustrativepurposes we focus here on a nonlinear functional such as the variance.
6
approximate In(R) – a distinction that may be relevant in this nonparametric application
(particularly if D 6= Z), but was unnecessary in the parametric model of Section 2.1.
To obtain a critical value we proceed in a conceptually similar manner to Section 2.1.
First, for b ∈ 1, . . . , B indexing bootstrap draws of ωbini=1 i.i.d. with ωi ∼ N(0, 1) let
Wbn(θ) ≡ 1√
n
n∑i=1
ωbi(1Yi ≤ θ(Di) − τ)qkn(Zi)−1
n
n∑j=1
(1Yj ≤ θ(Di) − τ)qkn(Zj),
be our multiplier bootstrap analogue to the centered moments. Second, since Qn is not
everywhere differentiable, we approximate the derivative of the moments by
Dn(θ)[pjn′β] ≡ 1√n
n∑i=1
qkn(Zi)(1Yi ≤ θ(Di) +pjn(Di)
′β√n
− 1Yi ≤ θ(Di)),
which is a numerical derivative with step size 1/√n. Third, to account for the possible
values that the restricted estimator θn may take we introduce the set
Vn(θn, `n) =β ∈ Rjn : ∇2pjn(d)′β ≥
√nmin0, rn +∇θn(d) for all d ∈ [0, 1],∫ 1
0(∇θn(u)+
∇pjn(u)′β√n
)2du−(
∫ 1
0∇(θn(u)+
pjn(u)′β√n
)du)2 = λ, ‖pjn′β‖2,∞ ≤√n`n
where the choices of rn and `n are discussed below and the norm ‖ · ‖2,∞ is defined as
‖pjn′β‖2,∞ = max0≤α≤2
supd∈[0,1]
|∇αpjn(d)′β|.
In analogy to Section 2.1, Vn(θn, `n) represents possible local deviations from the ap-
proximation to θ0(P ) in Θn. Our bootstrap approximations to In(R) and In(Θ) are
U bn(R|`n) ≡ infβ∈Vn(θn,`n)
(Wbn(θn) + Dn(θn)[pjnβ])′(Wb
n(θn) + Dn(θn)[pjnβ])1/2
U bn(Θ|`n) ≡ infβ:‖pjn′β‖2,∞≤
√n`n(Wb
n(θun) + Dn(θu
n)[pjnβ])′(Wbn(θu
n) + Dn(θun)[pjnβ])1/2.
Hence, a test based on In(R) rejects whenever In(R) exceeds the 1 − α quantile of
U bn(R|`n) across bootstrap replications, while a test based on In(R) − In(Θ) rejects
whenever In(R)− In(Θ) exceeds the 1− α quantile of U bn(R|`n)− U bn(Θ|`n).
Our critical values depend on the choices of rn and `n. The slackness parameter rn
again measures sampling uncertainty in whether constraints “bind” and we thus set
P ( maxd∈[0,1]
∇2θun(d)−∇2θu?
n (d) ≤ rn|Data) = 1− γn
for θu?n a “bootstrap” analogue to θu
n and γn → 0 as in Section 2.1. The band-
7
width `n regularizes the local parameter space, and its main role is to ensure that
`n × (rate of convergence) = o(n−1/2) in settings in which the rate of convergence is
slower than the usually required o(n−1/4). We thus suggest setting `n to satisfy
P ( maxd∈[0,1]
|∇2θun(d)−∇2θu?
n (d)| ≤ 1√n`n|Data) = 1− γn.
In settings in which the rate of convergence is expected to be sufficiently fast, no regu-
larization is necessary and one may set `n = +∞; see Lemma 3.1 below.
3 General Theory
We next turn to developing a general inferential framework that encompasses the tests
discussed in Section 2 as special cases. The class of models we consider are those in
which the parameter of interest θ0 ∈ Θ satisfies J conditional moment restrictions
EP [ρ(X, θ0)|Z] = 0 for 1 ≤ ≤ J
with ρ : Rdx × Θ → R, X ∈ Rdx , Z ∈ Rdz , V ≡ (X, ZJ=1), and V ∼ P ∈ P.
In some of the applications that motivate us, such as Hausman and Newey (2016), the
parameter θ0 is not identified. It is therefore convenient to define the identified set
Θ0(P ) ≡ θ ∈ Θ : EP [ρ(X, θ)|Z] = 0 for 1 ≤ ≤ J
and employ it as the basis of our statistical analysis. Formally, for a set R of parameters
satisfying a conjectured restriction, we develop a test for the hypothesis
H0 : Θ0(P ) ∩R 6= ∅ H1 : Θ0(P ) ∩R = ∅; (6)
i.e. we devise a test of whether at least one element of the identified set satisfies the
posited constraints. In an identified model, a test of (6) is thus equivalent to a test of
whether θ0 satisfies the hypothesized constraint. We denote the set of distributions P
satisfying the null hypothesis by P0 ≡ P ∈ P : Θ0(P ) ∩R 6= ∅.
The defining elements determining the generality of the hypothesis allowed for by
(6) are the choices of Θ and R. In imposing restrictions on both Θ and R we therefore
aim to allow for a general framework while simultaneously ensuring enough structure
for a fruitful asymptotic analysis. To this end, we require Θ to be a subset of a complete
normed vector space B (i.e. a Banach space) and consider sets R with the structure
R ≡ θ ∈ B : ΥF (θ) = 0 and ΥG(θ) ≤ 0, (7)
where ΥF : B→ F and ΥG : B→ G are known maps. Our first assumption formalizes
8
the restrictions that we impose on the parameter space Θ and the restriction set R.
Assumption 3.1. (i) Θ ⊆ B, where B is a Banach space with metric ‖ · ‖B; (ii)
ΥF : B→ F and ΥG : B→ G, where F is a Banach space with metric ‖ · ‖F and G is
an AM space with order unit 1G and metric ‖ · ‖G.
Assumption 3.1(i) allows us to address parametric, semiparametric, and nonpara-
metric models. Assumption 3.1(ii) similarly imposes that ΥF take values in a Banach
space F, while ΥG is required to take values in an AM space G – we provide an overview
of AM spaces in the supplemental appendix. Heuristically, the essential properties of G
are: (i) G is a vector space equipped with a partial order “≤”; (ii) The partial order and
the vector space operations interact in the same manner they do on R (e.g. if θ1 ≤ θ2,
then θ1 + θ3 ≤ θ2 + θ3); and (iii) The order unit 1G ∈ G is an element such that for any
θ ∈ G there exists a scalar λ > 0 satisfying |θ| ≤ λ1G (e.g. when G = Rd we may set
1G ≡ (1, . . . , 1)′ ∈ Rd). These properties of an AM space will prove instrumental in our
analysis. In particular, the order unit 1G will provide a crucial link between the partial
order “≤”, the norm ‖ · ‖G, and (through smoothness of ΥG) allow us to leverage a rate
of convergence in B to build a suitable sample analogue to the local parameter space.
3.1 The Test Statistic
We test the null hypothesis in (6) by employing sieve-GMM criterion based tests that
may be viewed as a generalizations of the J-test of Sargan (1958) and Hansen (1982) or
the incremental J-test of Eichenbaum et al. (1988).
For each instrument Z, we consider transformations qk,n,kn,k=1 and let q
kn,n, (z) ≡
(q1,n,(z), . . . , qkn,,n,(z))′. Further setting Z ≡ (Z ′1, . . . , Z
′J )′, kn ≡
∑J=1 kn,, q
knn (z) ≡
(qkn,1n,1 (z1)′, . . . , q
kn,Jn,J (zJ )′)′, and ρ(x, θ) ≡ (ρ1(x, θ), . . . , ρJ (x, θ))′ we then let
1√n
n∑i=1
ρ(Xi, θ) ∗ qknn (Zi) ≡1√n
n∑i=1
(ρ1(Xi, θ)qkn,1n,1 (Zi,1)′, . . . , ρJ (Xi, θ)q
kn,Jn,J (Zi,J )′)′;
i.e. for each θ we obtain the scaled sample averages of the product of each “residual”
ρ(X, θ) with the transformations of its respective instrument Z. For a possibly data
dependant kn × kn matrix Σn, we then construct a criterion function Qn : Θ→ R+ by
Qn(θ) ≡ ‖ 1
n
n∑i=1
ρ(Xi, θ) ∗ qknn (Zi)‖Σn,p,
where for any a ∈ Rkn we let ‖a‖Σn,p ≡ ‖Σna‖p and ‖ · ‖p be the p-norm on Rkn for any
p ≥ 2 – i.e. ‖a‖pp ≡∑d
i=1 |a(i)|p for any a ≡ (a(1), . . . , a(d))′ ∈ Rd. We note that setting
p = 2 leads to the computationally most attractive test. However, we allow for p 6= 2 as
higher values of p enables us to establish coupling results under weaker conditions.
9
Heuristically,√nQn should diverge to infinity when evaluated at any θ /∈ Θ0(P )
and remain “stable” when evaluated at a θ ∈ Θ0(P ). Thus, examining the minimum
of√nQn over R should reveal whether there is a θ that simultaneously makes
√nQn
“stable” (θ ∈ Θ0(P )) and satisfies the conjectured restriction (θ ∈ R). This intuition
leads to a generalization of the J-test that is based on the test statistic
In(R) ≡ infθ∈Θn∩R
√nQn(θ),
where Θn ∩R is a finite dimensional subset of Θ ∩R that grows dense in Θ ∩R (Chen,
2007). Alternatively, we may recenter our test statistic by considering
In(R)− In(Θ).
Intuitively, relative to In(R), tests based on In(R)−In(Θ) can have higher power against
alternatives that satisfy Θ0(P ) 6= ∅ but lower power against alternatives under which
Θ0(P ) = ∅ (Chen and Santos, 2018). The theoretical study of both statistics, however,
is similar. Specifically, we will obtain coupling results that apply to statistics In(R) and
suitable bootstrap counterparts for general sets R. As a result, they will also apply to
R = Θ, which immediately lets us analyze the recentered statistic as well.
Before proceeding, we introduce notation and assumptions we employ throughout
our analysis. For any function f of V we let Gn,P f ≡∑n
i=1f(Vi) − EP [f(V )]/√n
denote the empirical process evaluated at f . We will often evaluate Gn,P on the class
Fn ≡ ρ(·, θ) : θ ∈ Θn ∩R and 1 ≤ ≤ J. (8)
The “size” of Fn plays a crucial role, and we control it through the bracketing integral
J[ ](δ,Fn, ‖ · ‖P,2) ≡∫ δ
0
√1 + logN[ ](ε,Fn, ‖ · ‖P,2)dε, (9)
where N[ ](ε,Fn, ‖ · ‖P,2) is the smallest number of ε-brackets (under ‖ · ‖P,2) required to
cover Fn. It will also prove useful to denote the vector subspace generated by Θn ∩ Rby Bn. If there are multiple norms ‖ · ‖A1 and ‖ · ‖A2 on Bn, we further set
Sn(A1,A2) ≡ supb∈Bn
‖b‖A1
‖b‖A2
, (10)
which we note depends on the sample size n only through the choice of sieve Θn ∩R.
The following assumptions introduce the basic structure we employ.
Assumption 3.2. (i) Vi∞i=1 is an i.i.d. sequence with Vi ∼ P ∈ P.
Assumption 3.3. (i) sup1≤≤J sup1≤k≤kn, ‖qk,n,‖∞ ≤ Bn with Bn ≥ 1; (ii) The eigen-
values of EP [qkn,n, (Z)q
kn,n, (Z)
′] are bounded in , n, P ∈ P; (iii) Fn has envelope Fn,
10
supP∈P ‖Fn‖P,2 <∞, and supP∈P J[ ](‖Fn‖P,2,Fn, ‖ · ‖P,2) ≤ Jn for some Jn <∞.
Assumption 3.4. (i) For each P ∈ P there is a Σn(P ) > 0 with ‖Σn − Σn(P )‖o,p =
oP (1) uniformly in P ∈ P; (ii) The matrices Σn(P ) are invertible for all n and P ∈ P;
(iii) ‖Σn(P )‖o,p and ‖Σn(P )−1‖o,p are uniformly bounded in n and P ∈ P.
Assumption 3.2 imposes that the sample Vini=1 be i.i.d. with P belonging to a set of
distributions P over which our results will hold uniformly. By allowing Bn to depend on
n, Assumption 3.3(i) accommodates both transformations that are uniformly bounded
in n, such as trigonometric series, and those with diverging bound, such as normalized
B-splines. The bound on eigenvalues imposed in Assumption 3.3(ii) guarantees that
qk,n,kn,n=1 are Bessel sequences uniformly in n, while Assumption 3.3(iii) controls the
“size” of the class Fn, which is crucial in studying the induced empirical process. We
note that Jn is allowed to diverge with the sample size and thus Assumption 3.3(iii)
accommodates non-compact parameter spaces Θ as in Chen and Pouzo (2012, 2015).
Alternatively, if the class F ≡⋃∞n=1Fn is restricted to be Donsker, then Assumptions
3.3(ii)-(iii) can hold with uniformly bounded Jn and ‖Fn‖P,2. Finally, Assumption 3.4
requires the weighting matrix Σn to converge to an invertible matrix Σn(P ) – here,
‖ · ‖o,p denotes the operator norm when Rkn is endowed with ‖ · ‖p.
3.2 Strong Approximation
We begin our analysis by obtaining a strong approximation to statistics of the form
In(R). The results immediately apply to In(R)− In(Θ) as well by setting R = Θ.
3.2.1 Rate of Convergence
As a preliminary step, we first aim to characterize the rate of convergence of the (ap-
proximate) minimizers of Qn on Θn ∩R. Formally, for any sequence τn ↓ 0, we let
Θrn ≡ θ ∈ Θn ∩R : Qn(θ) ≤ inf
θ∈Θn∩RQn(θ) + τn, (11)
which constitutes the set of exact (τn = 0) or near (τn > 0) minimizers of Qn. We
consider the general case with τn ↓ 0, as we employ both the set of exact and near
minimizers in characterizing and estimating the distribution of our test statistics.
Following the literature on set estimation (Chernozhukov et al., 2007; Beresteanu
and Molinari, 2008; Santos, 2011; Kaido and Santos, 2014), for any sets A and B we let
−→d H(A,B, ‖ · ‖E) ≡ sup
a∈Ainfb∈B‖a− b‖E
dH(A,B, ‖ · ‖E) ≡ max−→d H(A,B, ‖ · ‖E),
−→d H(B,A, ‖ · ‖E),
11
which constitute the directed Hausdorff distance and the Hausdorff distance under a
norm ‖ · ‖E that need not equal ‖ · ‖B. Heuristically, in order to obtain a strong approx-
imation we require a convergence rate under a metric for which the empirical process is
equicontinuous – a purpose for which ‖ · ‖B is often too strong with its use leading to
overly stringent assumptions. By introducing a potentially weaker norm ‖ · ‖E we are
thus able to obtain better rate conditions in some infinite dimensional problems.
For each θ ∈ Θ∩R, we further let Πrnθ denote its approximation on Θn ∩R and set
Θr0n(P ) ≡ Πr
nθ : θ ∈ Θ0(P ) ∩R. (12)
It will also prove useful to define the population analogue to Qn(θ), which is given by
Qn,P (θ) ≡ ‖EP [ρ(X, θ) ∗ qknn (Z)]‖Σn(P ),p.
Our next set of assumptions suffices for deriving a rate of convergence for Θrn.
Assumption 3.5. For each P ∈ P0 there is a map Πrn : Θ ∩ R → Θn ∩ R such that
supP∈P0supθ∈Θ0(P )∩RQn,P (Πr
nθ)−infθ∈Θn∩RQn,P (θ) = O(JnBnk1/pn
√log(1 + kn)/n).
Assumption 3.6. (i) There is a norm ‖ ·‖E on Bn, sets Vn(P ) ⊆ Θn∩R, and νn∞n=1
with ν−1n = O(1), such that Θr
n ⊆ Vn(P ) with probability tending to one uniformly in
P ∈ P0 and for any θ ∈ Vn(P ) and ηn ≡ JnBnk1/pn
√log(1 + kn)/n it follows
ν−1n
−→d H(θ,Θr
0n(P ), ‖ · ‖E) ≤ Qn,P (θ)− supθ0∈Θr
0n(P )Qn,P (θ0)+O(ηn);
(ii) Σn satisfies supθ∈Θr0n(P )Qn,P (θ)× ‖Σn − Σn(P )‖o,p = OP (ηn) uniformly in P0.
Assumption 3.5 formalizes the sense in which Θ0(P ) ∩ R must be approximated by
Θr0n(P ). Since Θ0(P )∩R minimizes Qn,P over the entire parameter space, Assumption
3.5 requires that Θr0n(P ) be “close” to minimizing Qn,P over the sieve. Assumption
3.6(i) introduces a local identification condition by requiring that on a set Vn(P ), the
criterion Qn,P grow at a rate ν−1n as θ moves away from Θr
0n(P ). The parameter ν−1n ,
which implicitly depends on kn and the choice of sieve Θn ∩ R, is conceptually related
to sieve measure of ill-posedness (Blundell et al., 2007). The set Vn(P ) may be taken to
equal the entire sieve in convex models, or it may be taken to equal a local neighborhood
of Θr0n(P ) after establishing the consistency of Θr
n; see Lemma S.3.1 in the supplemental
appendix. Finally, Assumption 3.6(ii) imposes a rate of convergence requirement on Σn.
The next theorem employs Assumptions 3.5 and 3.6 to obtain a rate of convergence.
Theorem 3.1. Let Assumptions 3.2, 3.3(i), 3.3(iii), 3.4, 3.5, and 3.6 hold, and let
Rn ≡ νnk
1/pn
√log(1 + kn)JnBn√
n. (13)
12
Then uniformly in P ∈ P0: (i)−→d H(Θr
n,Θr0n(P ), ‖ · ‖E) = OP (Rn + νnτn); and (ii)
dH(Θrn,Θ
r0n(P ), ‖ · ‖E) = OP (νnτn) provided JnBnk
1/pn
√log(1 + kn)/n = o(τn).
Theorem 3.1(i) implies that, with arbitrarily high probability, Θrn(P ) is contained
in a neighborhood of Θr0n(P ) that shrinks at a Rn + νnτn rate. We further note that
in identified models Θr0n(P ) is a singleton and Theorem 3.1(i) reduces to consistency in
the Hausdorff metric. For partially identified models, however, Theorem 3.1(ii) further
requires τn not tend to zero too fast in order to obtain Hausdorff consistency. It is also
worth noting that, under assumptions on the Hausdorff distance between Θr0n(P ) and
Θ0(P )∩R, Theorem 3.1 and the triangle inequality can yield a rate of convergence of Θn
to Θ0(P )∩R. Heuristically, we focus on convergence to Θr0n(P ) (instead of Θ0(P )∩R)
because our strong approximation will rely on undersmoothing.
While we employ Theorem 3.1 in our forthcoming analysis, we emphasize that in
specific applications alternative results that are better suited for the particular structure
of the model may be available. In this regard, we note Assumptions 3.5 and 3.6 are not
needed in our analysis beyond their role in delivering a rate of convergence. In particular,
if an alternative rate is derived under different assumptions, then such a result can still be
combined with our analysis to establish the validity of the proposed inferential methods.
3.2.2 Local Approximation
We follow the literature by observing that it is possible to accommodate non-differentiable
moment functions by requiring their conditional moments to be differentiable (Chen and
Pouzo, 2009, 2012). To this end, for any 1 ≤ ≤ J we define mP, : B→ L2P by
mP,(θ)(Z) ≡ EP [ρ(X, θ)|Z].
Our strong approximation then relies on the following assumptions.
Assumption 3.11 imposes that ΥG be Frechet differentiable. The constant Kg, em-
ployed in the construction of Vn(θ, `), may be interpreted as a bound on the second
17
θ0
Vn(θ0,+∞)
θn
Gn(θn)
Figure 1: Impact of inequality constraints for restriction θ0 ≤ 0. Left Panel: True local
parameter space in shaded region. Right Panel: Approximation Gn(θn) in shaded region.
derivative of ΥG and equals zero when ΥG is linear. Assumptions 3.12 and 3.13 mark
an important difference between hypotheses in which ΥF is linear and those in which
ΥF is nonlinear – note linear ΥF automatically satisfy Assumptions 3.12 and 3.13. This
distinction reflects that when ΥF is linear its impact on the local parameter space is
known and need not be estimated.3 Thus, while Assumptions 3.12(i)-(iii) impose condi-
tions analogous to those required of ΥG, Assumption 3.12(iv) additionally demands that
∇ΥF (θ) posses a norm bounded right inverse on (Θr0n(P ))ε – the existence of a right in-
verse is equivalent to a classical rank condition.4 Finally, for nonlinear ΥF , Assumption
3.13(ii) requires the existence of a local perturbation to any θ0 ∈ Θr0n(P ) that relaxes
“active” inequality constraints without a first order effect on the equality restrictions.
Figure 1 illustrates how we account for inequality constraints in the case in which
B is the set of continuous functions on R and we aim to test whether θ0(x) ≤ 0 for all
x ∈ R. Absent equality constraints, the local parameter space for θ0 consists of the set
of perturbations h/√n such that θ0 + h/
√n remains negative. For an estimator θn of
θ0, Gn(θn) then consists of the h/√n such that at any x either (i) θn(x) + h(x)/
√n is
not “too close” to zero (to account for sampling uncertainty) or (ii) h(x)/√n is negative
(since all negative h/√n belong to the local parameter space for θ0). As θn converges
to θ0, Gn(θn) is asymptotically contained in the local parameter space of θ0. Unlike
Figure 1, whenever ΥG is nonlinear we must further account for the curvature of ΥG
which motivates the presence of the term Kgrn‖h/√n‖B1G in (21).
Figure 2 illustrates how regularizing with a ‖·‖B-norm constraint allows us to account
3For linear ΥF , the requirement ΥF (θ + h/√n) = 0 is equivalent to ΥF (h) = 0 for any θ ∈ R.
4Recall for a linear map Γ : Bn → Fn, its right inverse is a map Γ− : Fn → Bn such that ΓΓ−(h) = hfor any h ∈ Bn. The right inverse Γ− need not be unique if Γ is not bijective, in which case Assumption3.12(iv) is satisfied as long as it holds for some right inverse of ∇ΥF (θ) : Bn → Fn.
18
θ : ΥF (θ) = 0
θ0
θ2
θ1
Vn(θ0,+∞)
Vn(θ2,+∞)
Vn(θ1,+∞)
Figure 2: Impact of equality constraints. Left panel: Set of θ satisfying nonlinear equality
restriction. Right panel: Corresponding local parameter spaces (absent inequality constraints).
for the impact of equality constraints on the local parameter space when B = R2 and
F = R. In this setting, the constraint ΥF (θ) = 0 and the local parameter space
Vn(θ,+∞) generate curves in R2. Since all curves Vn(θ,+∞) pass through zero, they
are all “close” in a neighborhood of the origin. However, for nonlinear ΥF the size of
the neighborhood of the origin in which Vn(θn,+∞) is “close” to Vn(θ0,+∞) crucially
depends on both the distance of θn to θ0 and the curvature of ΥF . The set Vn(θn, `n)
therefore accounts for the impact of equality constraints by restricting attention to the
expanding neighborhood of the origin in which Vn(θn,+∞) resembles Vn(θ0,+∞).
Remark 3.1. Whenever ΥF and ΥG are linear, controlling the norm ‖ · ‖B is no longer
necessary as the “curvature” of ΥF and ΥG is known. As a result, we may instead set
Vn(θ, `n) ≡ h√n∈ Bn :
h√n∈ Gn(θ), ΥF (θ +
h√n
) = 0 and ‖ h√n‖E ≤ `n. (23)
Controlling the norm ‖ · ‖E, however, may still be necessary to ensure that Dn(θ)[h] is
a consistent estimator for Dn,P (θ)[h] uniformly in h/√n ∈ Vn(θ, `n).
3.3.3 Bootstrap Coupling
We impose a final set of assumptions in order to couple our bootstrap statistic.
where the concentration parameter %n is smaller than the coupling rate (i.e. %n ≤ a−1n ).
It is well known that uniform consistent estimation of an approximating distribution
is not sufficient for establishing asymptotic size control (Romano and Shaikh, 2012).
Intuitively, in order to achieve size control the approximate distribution must be suitably
continuous at the quantile of interest uniformly in P ∈ P0. Assumption 3.17 imposes
precisely this requirement, allowing the modulus of continuity, captured here by the
concentration parameter %n, to deteriorate with the sample size provided that %n ≤ a−1n
– i.e. the loss of continuity must occur at a slower rate than the coupling rate of Theorems
21
3.2 and 3.3. We refer the reader to Chernozhukov et al. (2013, 2014) for further discussion
and motivation of conditions of this type, called anti-concentration conditions.5
The next result establishes the asymptotic validity of a test based on In(R).
Corollary 3.1. Let Assumption 3.17 hold and the conditions of Theorem 3.2(i) and
Theorem 3.3 be satisfied. If cn = q1−α(Un(R|`n)), then it follows that:
lim supn→∞
supP∈P0
P (In(R) > cn) ≤ α.
We note that if there are no inequality constraints, then it is possible to show that
the test in Corollary 3.1 is similar and its asymptotic size equals the nominal level α
whenever the conditions of Theorem 3.2(ii) are satisfied. The consistency of the test
against any P ∈ P \P0 for which max ‖EP [ρ(X, θ)|Z]‖P,2 is bounded away from zero
(in θ ∈ Θ ∩R) is also straightforward to establish under suitable conditions.
3.4.2 Critical Value: In(R)− In(Θ)
We next establish the asymptotic validity of a test based on In(R) − In(Θ). To this
end, we will apply Theorems 3.2 and 3.3 to both R as in (7) and R = Θ, which in turn
will require us to impose assumptions for both cases. Throughout, we therefore signify
parameters associated with setting R = Θ by a “u” superscript – e.g. Fun , Θu
0n(P ), and
V un (θ, `) are understood to be as in (8), (12), and (14) but with R = Θ. Employing this
notation, we then note that the analogue to Un,P (R|`n) (as in (15)) is given by
Un,P (Θ|`n) ≡ infθ0∈Θu
0n(P )inf
h√n∈V u
n (θ0,`n)‖Wn,Pρ(·, θ0) ∗ qknn + Dn,P (θ0)[h]‖Σn(P ),p.
Our next result establishes a strong approximation to the recentered statistic.
Corollary 3.2. Let the conditions of Theorem 3.2(i) be satisfied with R as in (7) and the
conditions of Theorem 3.2(ii) be satisfied with R = Θ. Then, for any `n, `un ↓ 0 satisfying
k1/pn
√log(1 + kn)Bn × supP∈P J[ ](`
κρn ,Fn, ‖ · ‖P,2) + supP∈P J[ ]((`
un)κρ ,Fu
n , ‖ · ‖P,2) =
o(an), Km(`un)2 × Sun(L,E) = o(an), and Ru
n = o(`un) it follows uniformly in P ∈ P0
In(R)− In(Θ) ≤ Un,P (R|`n)− Un,P (Θ|`un) + oP (an).
Corollary 3.2 follows immediately by applying Theorem 3.2(i) to couple In(R) and
Theorem 3.2(ii) to couple In(Θ). In particular, it is worth emphasizing that in coupling
In(Θ) we must rely on Theorem 3.2(ii) instead of Theorem 3.2(i) in order to ensure that
5Alternatively, Assumption 3.17 can be dispensed by adding a fixed constant η > 0 to the criticalvalue – i.e. using q1−α(Un(R|`n)) + η as the critical value (Andrews and Shi, 2013).
22
In(R) − In(Θ) is weakly smaller than its strong approximation. As a result, whenever
moments are nonlinear, Corollary 3.2 requires the rate of convergence of the uncon-
strained estimator to be sufficiently fast for Theorem 3.2(ii) to apply.
In order to obtain a bootstrap approximation, we define the following estimators
Θun ≡ θ ∈ Θn : Qn(θ) ≤ inf
θ∈ΘnQn(θ) + τu
n V un (`) ≡ h√
n∈ Bu
n : ‖ h√n‖E ≤ `
i.e. Θun is simply the set estimator of Section 3.2.1 applied with Θ = R, while V u
n (`) is
the local parameter space sample analogue of Section 3.3.2 applied with no equality or
inequality constraints. As a bootstrap approximation to Un,P (Θ|`un) we then employ
Un(Θ|`un) ≡ infθ∈Θu
n
infh√n∈V u
n (`un)‖Wnρ(·, θ) ∗ qknn + Dn(θ)[h]‖Σn,p.
The next corollary obtains a coupling for our bootstrap approximation.
Corollary 3.3. Let the conditions of Theorem 3.3 be satisfied with R as in (7) and
R = Θ, and suppose JunBnk
1/pn
√log(1 + kn)/n = o(τu
n ). Then, it follows that there are˜n `n and ˜u
n `un such that uniformly in P ∈ P0 we have
Un(R|`n)− Un(Θ|`un) ≥ U?n,P (R|˜n)− U?n,P (Θ|˜un) + oP (an).
Corollary 3.3 simply follows by applying Theorem 3.3 to Un(R|`n) and coupling
Un(Θ|`un) to U?n,P (Θ|˜un). For the latter coupling, it is important that Θu
n be consistent
for Θu0n(P ) in the Hausdorff metric instead of the directed Hausdorff metric. While both
metrics are equivalent for identified models, and hence we may set τun = 0, in partially
identified models we require that τun tend to zero sufficiently slowly as in Theorem 3.1(ii).
We further note that in identified models, it is possible to employ either Wnρ(·, θn) or
Wnρ(·, θun) in constructing both Un(R|`n) and Un(Θ|`un) – a change that results in an
asymptotically equivalent coupling but ensures that the bootstrap statistic is positive.
The asymptotic validity of a test that rejects whenever In(R) − In(Θ) exceeds the
appropriate quantile of our bootstrap approximation immediately follows. In addition,
we note that it is always possible to set `un = +∞ – a choice that does not lead to a loss
of power under conditions related to those stated in Lemma 3.1.
Corollary 3.4. Let Assumption 3.17 hold with In(R)− In(Θ) instead of In(R), and the
conditions of Corollaries 3.2, 3.3 be satisfied. If cn = q1−α(Un(R|`n)− Un(Θ|`un)), then:
lim supn→∞
supP∈P0
P (In(R)− In(Θ) > cn) ≤ α.
Moreover, the same conclusion holds if we instead set cn = q1−α(Un(R|`n)−Un(Θ|+∞)).
23
4 Heterogeneity and Demand Analysis
For our final example, we illustrate how to conduct inference in the heterogeneous de-
mand model of Hausman and Newey (2016). Specifically, for Y ∈ [0, 1] equal to the
expenditure share on a commodity, W ∈ W ⊆ Rdw a vector of prices, income, and
covariates, and η representing unobserved individual heterogeneity we suppose
Y = g(W, η) (26)
where g is a known function of (W, η). The requirement that g be known is not neces-
sarily restrictive as η can in principle be infinite dimensional. For instance, Hausman
and Newey (2016) show that if expenditure share is actually given by Y = g(W, ε) for
some unknown function g and unobserved ε, then under appropriate conditions
Y = g(W, ε) =∞∑j=1
ψj(W )βj(ε), (27)
where ψj∞j=1 is a known basis with∑∞
j=1 ψ2j (W ) finite and βj(ε)∞j=1 are unknown
coefficients. Setting η = βj(ε)∞j=1 and viewing η as a random variable in the sequence
space `2 ≡ aj∞j=1 :∑
j a2j <∞, we then note (27) reduces to (26) with g known.
If the covariates W are independent of η, then for any c ∈ R it follows that
P (Y ≤ c|W ) = P (g(W, η) ≤ c|W ) =
∫1g(W, η) ≤ cνP (dη) (28)
where νP denotes the unknown distribution of η. Result (28) restricts the possible values
of νP and hence the identified set for average exact consumer surplus, average share, and
other functionals of interest. Specifically, for Ψ(g, η) an object of interest for preferences
denoted by η, such as equivalent variation, Hausman and Newey (2016) study∫Ψ(g, η)νP (dη), (29)
which is the average across individuals. By evaluating the set of values of (29) which
can be generated by a distribution νP satisfying (28) at a grid cJ=1, Hausman and
Newey (2016) provide estimates of the identified set for (29). We further note bounds
on the distribution of Ψ(g, η) under νP can be obtained by replacing Ψ(g, η) in (29) with
an indicator that Ψ(g, η) be less than or equal to some real number.
In what follows, we apply our results to conduct inference on functionals as in
(29). To this end, we let FP (c|W ) ≡ P (Y ≤ c|W ) for a given grid cJ=1 and
set θ0(P ) = (FP (c|W )J=1, νP ). To define B, we suppose η ∈ Ω for some known
Hausdorff space Ω, set B to be the Borel σ-algebra on Ω, let M(B) be the space of
regular signed Borel measures on Ω, and let ‖ · ‖TV denote the total variation norm.
24
Assuming FP (c|·) ∈ CB(W) for CB(W) the space of continuous and bounded functions
on W, we define B ≡ (⊗J
=1CB(W))×M(B) and for any (F (c|·)J=1, ν) = θ ∈ B let
‖θ‖B =∑J
=1 ‖F (c|·)‖∞ + ‖ν‖TV . As the parameter space Θ we then set
Θ ≡ (F (c|·)J=1, ν) = θ ∈ B : max1≤≤J
‖F (c|·)‖∞ ≤ 2, (30)
where the “2” norm bound is simply selected to ensure θ0(P ) is in the interior of Θ.
Letting X ≡ (Y,W ) and Z = W for every 1 ≤ ≤ J we then define
ρ(X, θ) = 1Y ≤ c − F (c|W ), (31)
which yields conditional moment restrictions that identify FP (c|W ) – νP , however, is
potentially partially identified. Also let G = `∞(B), which is an AM space under ‖ · ‖∞with order unit 1G satisfying 1G(B) = 1 for any B ∈ B. Defining ΥG : B→ `∞(B) by
ΥG(θ)(B) = −ν(B) (32)
for any set B ∈ B and (F (c|·)J=1, ν) = θ ∈ B, then enables us to impose the restriction
that ν be a positive measure through the inequality ΥG(θ) ≤ 0. Finally, for a grid
wlLl=1 we let ΥF : B→ RJL+2 be given by ΥF (θ) = (Υ(e)F (θ),Υ
(ν)F (θ),Υ
(s)F (θ)), where
Υ(e)F (θ) = F (c|wl)−
∫1g(wl, η) ≤ cν(dη)1≤≤J ,1≤l≤L (33)
Υ(ν)F (θ) = ν(Ω)− 1 (34)
Υ(s)F (θ) =
∫Ψ(g, η)ν(dη)− λ; (35)
i.e. (33) reflects (28) holds, (34) demands that ν have measure one, and (35) lets us
impose that average object of interest be equal to a hypothesized value λ. By conducting
test inversion in λ we can obtain a confidence region for the functional in (29).
As in Hausman and Newey (2016), we can impose utility maximization by requiring
that Ω consist only of η such that g(·, η) satisfies the Slutsky conditions. One may
sample from Ω by drawing randomly from sets of η that satisfy Slutsky symmetry and
only keeping those where the compensated price effects matrix is negative semidefinite
on a grid. This is the procedure followed in Hausman and Newey (2016) for two goods.
Given a collection of orthogonal probability measures µs,nsns=1 ⊂M(B) we employ
Mn(B) ≡ ν ∈M(B) : ν =
sn∑s=1
πsµs,n for some πssns=1 ∈ Rsn
as a sieve for M(B). To guarantee that each consumer is rational we focus on discrete
distributions where each µs,n is a Dirac measure for a point in Ω satisfying the Slutsky
25
conditions. For Dirac µs,n it is also particularly (computationally) simple to impose the
constraints ΥG(θ) ≤ 0 (see (32)). As a sieve for FP (c|·)J=1, we employ approximating
functions pj,njnj=1. In particular, setting pjnn (w) = (p1,n(w), . . . , pjn,n(w))′, we let
Θn ≡ (pjn′n βJ=1, ν) : ν ∈Mn(B) and max1≤≤J
‖pjn′n β‖∞ ≤ 2. (36)
Similarly, for a sequence qn,kknk=1 and kn × kn positive definite matrices Σ,nJ=1, we
set qknn (w) ≡ (q1,n(w), . . . , qkn,n(w))′ and for any (F (c|·)J=1, ν) = θ define
Qn(θ) ≡ J∑=1
‖ 1
n
n∑i=1
(1Yi ≤ c − F (c|Wi))qknn (Wi)‖2Σ,n,2
1/2. (37)
The statistics In(R) and In(Θ) then equal the minimums of√nQn over Θn∩R and Θn.
Our next set of assumptions enable us to couple In(R) and In(R)− In(Θ).
Assumption 4.1. (i) Yi,Wini=1 is i.i.d. P with P ∈ P; (ii) supw ‖pjnn (w)‖2 .
√jn;
(iii) EP [pjnn (W )pjnn (W )′] has eigenvalues bounded away from zero and infinity uniformly
in P ∈ P and n; (iv) For each P ∈ P0 and θ0 ∈ Θ0(P ) ∩ R, there exists a Πrnθ0 =
(Fn(c|·)J=1, νn) ∈ Θn ∩ R such that∑J
=1 ‖EP [(Fn(c|W ) − FP (c|W ))qknn (W )]‖2 =
O((n log(n))−1/2) uniformly in P ∈ P0 and θ0 ∈ Θ0(P ) ∩R.
Assumption 4.2. (i) max1≤k≤kn ‖qk,n‖∞ .√kn; (ii) EP [qknn (W )qknn (W )′] has eigen-
values bounded uniformly in P ∈ P and n; (iii) EP [qknn (W )pjnn (W )′] has singular values
bounded away from zero in P ∈ P and n; (iv) k2nj
3/2n log3(n) = o(n1/2).
Assumption 4.3. For all 1 ≤ ≤ J : (i) ‖Σ,n − Σ,n(P )‖o,2 = oP (1/kn√jn log2(n))
uniformly in P ∈ P; (ii) Σ,n(P ) is invertible and ‖Σ,n(P )‖o,2 and ‖Σ,n(P )−1‖o,2 are
bounded uniformly in P ∈ P and n.
Assumptions 4.1(ii)-(iv) state the conditions on Θn, with Assumptions 4.1(ii)(iii) be-
ing satisfied by standard choices such as B-Splines or wavelets. Assumption 4.1(iv) is an
asymptotic unbiasedness requirement – a condition that is eased by noting no require-
ments are imposed on the approximating space for νP . The requirements on qk,nknk=1
are stated in Assumption 4.2(i)(iii) and are again satisfied by standard choices. Assump-
tion 4.2(iv) states a rate condition that suffices for verifying the coupling requirements
of Theorem 3.2. Assumption 4.3 imposes the requirements on the weighting matrices.
Our next result employs Theorem 3.2(ii) to obtain strong approximations.
Theorem 4.1. Let Assumptions 4.1, 4.2, 4.3 hold, an = (log(n))−1/2, and for any
θ = (F (c|·)J=1, ν) ∈ B let ‖θ‖E = max1≤≤J ‖F (c|·)‖∞. If `n, `un ↓ 0 satisfy
26
kn√jn log2(n)(`n ∨ `un) = o(1), knjn log(n)/
√n = o(`n ∧ `un), then uniformly in P ∈ P0:
In(R) = Un,P (R|`n) + oP (an)
In(R)− In(Θ) = Un,P (R|`n)− Un,P (Θ|`un) + oP (an).
In order to conduct inference, we next aim to estimate the distributions of Un,P (R|`n)
and Un,P (Θ|`un). To this end, we note that Θr0n(P ) is potentially non-singleton and
we therefore employ a set estimator Θrn (as in (11)) to estimate the distribution of
Un,P (R|`n). In contrast, since Un,P (Θ|`un) only depends on θ0(P ) through its identified
component FP (c|·)J=1, for the unconstrained problem we employ any minimizer θun of
Qn on Θn. With regards to the local parameter space, we note that in this application
Gn(θ) = (pjn′n β,hJ=1, νh) : νh(B) ≥ minrn − ν(B), 0 for all B ∈ B (38)
for any θ = (F (c|·)J=1, ν). Computationally, since any ν, νh ∈ Mn(B) has the struc-
ture ν =∑sn
s=1 πsµs,n and νh =∑sn
s=1 πshµs,n it follows that the constraints in (38)
reduce to πsh ≥ minrn − πs, 0 for all 1 ≤ s ≤ sn whenever µs,nsns=1 are orthogonal,.
Furthermore, since moments and restrictions are linear, we may let `n = +∞ and set
Vn(θ,+∞) =
(pjn′n β,hJ=1, νh) :h√n∈ Gn(θ), ΥF (h) = 0
. (39)
Thus, the estimators of the strong approximations obtained in Theorem 4.1 equal
Un(R|+∞) ≡ infθ∈Θr
n
infh√n∈Vn(θ,+∞)
J∑=1
‖Wnρ(·, θ)qknn + D,n[h]‖Σ,n,21/2
Un(Θ|+∞) ≡ infh√n
J∑=1
‖Wnρ(·, θun)qknn + D,n[h]‖Σ,n,2
1/2.
Before stating our final assumption, we need an auxiliary result. To this end, define
satisfies ‖Fn(c|·)−FP (c|·)‖∞ = o(1) uniformly in θ0(P ) ∈ Θ0(P )∩R and P ∈ P0; (v)
knjn log2(n)τn = o(1), and ζn(knjn log(n)/√n+√jnτn) = o(rn).
The boundedness of Ψ(g, ·) on Ω ensures Υ(s)F (as in (35)) is continuous, while As-
sumption 4.4(ii) allows us to apply Lemma 4.1. Assumption 4.4(iii) is a low level suf-
ficient condition for verifying the bootstrap coupling requirement of Assumption 3.14.
These rate requirements could be improved under smoothness conditions on FP (c|·).Finally, Assumption 4.4(iv) imposes a mild requirement on the sieve, while Assumption
4.4(v) states conditions on τn and rn – note τn = 0 and rn = +∞ are always valid.
Our final result obtains a coupling for our bootstrap approximations.
Theorem 4.2. Let the conditions of Theorem 4.1 hold and Assumption 4.4 be satis-
fied. Then: for any sequences `n, `un ↓ 0 satisfying knjn log(n)/
√n = o(`n ∧ `un) and
kn√jn log2(n)(`n ∨ `un) = o(1) it follows uniformly in P ∈ P0 that
Un(R|+∞) ≥ U?n,P (R|`n) + oP (an)
Un(R|+∞)− Un(Θ|+∞) ≥ U?n,P (R|`n)− U?n,P (Θ|`un) + oP (an).
In particular, since the conditions on `n and `un imposed in Theorems 4.1 and 4.2 are
the same, it follows that we may employ the quantiles of Un(R|+∞) and Un(R|+∞)−Un(Θ|+∞) conditional on the data as critical values for In(R) and In(R)− In(Θ).
5 Simulation Evidence
We next study the finite sample performance of our inference procedure by revisiting
the simulation design in Chetverikov and Wilhelm (2017).
5.1 Identified Model
We first consider a nonparametric instrumental variable model in which, for some un-
known function θ0, the distribution of (Y,W,Z) ∈ R3 satisfies the restriction
Y = θ0(W ) + ε E[ε|Z] = 0; (41)
28
see Appendix S.6 for a formal study of this model. Following Chetverikov and Wilhelm
(2017), we set θ0(x) ≡ 0.2x+ x2 and for (ε, ζ, ν) independent standard normal random
variables we let Z = Φ(ζ), W = Φ(0.3ζ+√
1− (0.3)2ε), and ε = (0.3ε+√
1− (0.3)2ν)/2
for Φ the cumulative distribution function of a standard normal. All reported results
are based on five thousand replications employing five hundred bootstrap draws each.
Our results enable us to conduct inference on functionals of θ0 while imposing shape
restrictions. In what follows we set ΥF to denote either the level or the derivative of θ0 at
the point w0 = 0.5 and employ ΥG to impose that θ be either monotonically increasing
or monotonically increasing and convex. We employ the test statistic In(R) − In(Θ)
with r = 2 and Σn an estimate of the optimal weighting matrix based on a first stage
unconstrained estimator. The implementation of the test is similar to that of the linear
model of Section 2.1, with the difference that we must select the sieve Θn = pjn′n β : β ∈Rjn and qknn . We follow Chetverikov and Wilhelm (2017) in employing continuously
differentiable piecewise quadratic splines with equally spaced knots for both pjnn and qknn .
In computing critical values we set `n = `un = +∞ since the model is linear and
τn = 0 since the model is identified – note these choices are also valid under partial
identification. We select rn by proceeding as in Section 2.1. For pjn′n βun the minimizer of
In(Θ) and pjn′n βu?n its score bootstrap analogue (Kline and Santos, 2012), we set
P ( max1≤j≤J
Cjβu?′n − βn ≤ rn|Vini=1) = 1− γn (42)
where γn ∈ (0, 1) and the vectors Cj ∈ Rjn depend on the shape restriction being
imposed. We emphasize that the sequence γn must tend to zero in order for rn to satisfy
our assumptions. Finally, we employ the minimizer of In(R) in obtaining bootstrap
draws for both Un(R|+∞) and Un(Θ|+∞); see discussion following Corollary 3.3.
Table 1 reports empirical rejection probabilities under the null hypothesis for 5%-
level tests on the derivative and level of θ0 at w0 = 0.5 under different shape restrictions.
Simulations for additional values of (jn, kn) are included in the supplemental appendix.
With regards to rn, we examine the extreme possible values (0 and ∞) and choices
corresponding to (42) for different γn. Across all simulations, including those in the
supplemental appendix, we find that the rejection probability is below the nominal level
provided 1−γn ≥ 0.5 in (42). Overall, we find the general lack of sensitivity to different
choices of bandwidths to be reassuring for empirical practice.
In Figure 3 we report power curves for different 5%-level tests concerning the value
of θ0 and its derivative at w0 = 0.5. For conciseness, we focus on the sample sizes
n ∈ 1000, 5000 and rn chosen as in (42) with 1 − γn = 0.95. The curves labeled
“Mon” and “Mon+Conv” correspond to tests based on In(R)− In(Θ) with R imposing
Table 1: Empirical rejection probabilities for 5%-level tests based on In(R) − In(Θ).Value of rn set to a percentile corresponds to choice of 1− γn in (42).
monotonicity and monotonicity and convexity while changing the conjectured value
of θ0 and its derivative at w0 = 0.5. The curve labeled “Unres.” corresponds to a
Wald test based on the unrestricted estimator. For all designs we find that imposing
shape restrictions can yield important power gains. The benefits of imposing shape
restrictions, however, depend on both the sampling uncertainty and how “close” the
shape restrictions are to binding (Chetverikov et al., 2018). Since our design is fixed with
n and θ0 is strictly increasing and convex, in our simulations we see the advantages of
imposing shape restrictions decrease with n as sample uncertainty decreases. Similarly,
since estimating the derivative is a harder than estimating the level, we observe larger
power gains when imposing shape restrictions in the former problem.
5.2 Partially Identified Model
We next examine the performance of our test in a partially identified setting by dis-
cretizing the simulation design in Chetverikov and Wilhelm (2017). Concretely, we
generate (W,Z, ε) ∈ [0, 1]2 × R as in Section 5.1, divide [0, 1] into Sw and Sz equally
spaced segments, and generate dummy variables Dw and Dz for the segment to which
W and Z belong – e.g. if (Sw, Sz) = (3, 2), then Dw(W ) ≡ (1W ∈ [0, 1/3], 1W ∈(1/3, 2/3], 1W ∈ (2/3, 1])′ and Dz(Z) ≡ (1Z ∈ [0, 1/2], 1Z ∈ (1/2, 1])′. The
outcome Y is generated according to (41) but employing Dw in place of W .
The discretized design is characterized by Sz linear unconditional moment restric-
tions in Sw unknowns. For conciseness, we focus on imposing that θ0 be monotoni-
cally increasing and convex while conducting inference on the value of θ0 at the point
Power Curve for Level: (jn, kn) = (4, 6), n = 5000
Mon Mon+Conv Unres.
0 0.5 1 1.5 2 2.5 3 3.5 40
0.2
0.4
0.6
0.8
1
Power Curve for Derivative: (jn, kn) = (4, 4), n = 1000
Mon Mon+Conv Unres.
0 0.5 1 1.5 2 2.5 3 3.5 40
0.2
0.4
0.6
0.8
1
Power Curve for Derivative: (jn, kn) = (4, 6), n = 1000
Mon Mon+Conv Unres.
0 0.5 1 1.5 2 2.5 3 3.5 40
0.2
0.4
0.6
0.8
1
Power Curve for Derivative: (jn, kn) = (4, 4), n = 5000
Mon Mon+Conv Unres.
0 0.5 1 1.5 2 2.5 3 3.5 40
0.2
0.4
0.6
0.8
1
Power Curve for Derivative: (jn, kn) = (4, 6), n = 5000
Mon Mon+Conv Unres.
Figure 3: Rejection probabilities for 5%-level tests on conjectured value of θ0(0.5) (truevalue 0.35) and θ′0(0.5) (true value 1.2). Tests implemented with 1− γn = 0.05 in (42).
d0 ≡ Dw(0.5) – e.g, if Sw = 3, then d0 = (0, 1, 0)′. We note that θ0(d0) is generically not
identified whenever Sw > Sz, but imposing a shape restriction on θ0 partially identifies
θ0(d0) in this design. Table 2 reports the relevant identified sets.
Table 3: Empirical rejection probabilities for 5%-level tests based on In(R) for differentpoints in the null hypothesis. Lower and upper endpoints correspond to Table 2.
We test whether a value c belongs to the identified set for θ0(d0) by setting ΥF (θ) =
θ(d0)− c and employing ΥG to impose that θ be monotonically increasing and convex.
We base inference on In(R) with r = 2, Σn the sample analogue to E[DzD′z], all moment
restrictions (kn = Sz), and a saturated model for θ0 (jn = Sw). To compute critical
values we set `n = +∞ and τn = 0 – though note Θrn need not be a singleton when τn = 0
because jn > kn. We select rn by modifying the approach employed in Section 5.1: For
θLn and θUn minimizers and maximizers of θ(d0) over the set of θ that are monotonically
increasing, convex, and minimize ‖∑n
i=1(Yi − θ(Dw,i))Dz,i/n‖∞, we set
P ( max1≤j≤J
maxΥG,j(θL?n − θLn ),ΥG,j(θ
U?n − θUn ) ≤ rn|Vini=1) = 1− γn, (43)
where ΥG,j(θ) denotes the jth coordinate of the vector ΥG(θ) and θL?n and θU?n are again
computed employing the score bootstrap. As in our previous analysis, γn must tend to
zero with n in order for rn to satisfy our assumptions.
Table 3 reports empirical rejection rates for testing whether c belongs to the identified
set, with the lower and upper endpoint columns corresponding to setting c to equal
the lower and upper endpoints in Table 2. All tests are conducted at a 5% nominal
level. Across designs, we find that setting rn = +∞ always delivers tests with rejection
probabilities below their nominal level. Setting rn according to (43) with 1− γn = 0.95
also delivers adequate size control, with the exception of n = 5000 and (Sw, Sz) = (4, 2)
where we see a modest over-rejection at the lower endpoint of the identified set. Overall,
the degree of sensitivity to the choice of rn is similar to that found in Section 5.1.
32
6 Returns to Education
As a final illustration of the empirical content of our results, we conclude by examining
the role that shape restrictions can play in the study of Angrist and Krueger (1991) on
returns to education. Specifically, for Y log-earnings and D years of education we posit
Y = g(D) +W ′γ + ε, (44)
where W is a vector of observable covariates consisting of a constant, age, age squared,
and dummy variables for year of birth, race, marital status, residence in a Standard
Metropolitan Statistical Area (SMSA), and census region of residence. The function g
is modeled as a piecewise quadratic spline with a knot placed at the median level of
education. Following Angrist and Krueger (1991), we address the possible endogeneity
of D by employing a vector of instruments Z consisting of W and a set of quarter of
birth dummy variables interacted with year of birth dummy variables.
We conduct our analysis employing the decennial census extracts studied by Angrist
and Krueger (1991). All statistical procedures are based on In(R) − In(Θ) and, as in
Angrist and Krueger (1991), applied separately to men born in the 1920-1929, 1930-1939,
and 1940-1949 cohorts. Since the model is linear and identification holds when the rank
condition is satisfied, we set `n = +∞, τn = 0, and implement our tests by following
the discussion in Section 2.1. All bootstrap critical values are based on five thousand
bootstrap draws, with both bootstrap statistics based on the constrained estimator.
p-value of test of shape restrictionShape Restr. rn 1920-1929 Cohort 1930-1939 Cohort 1940-1949 Cohort
Table 5: 95% Confidence region for marginal return to education at the median.Columns: (i) Imposed shape restriction; (ii) Choice of rn, with rn = 95% denoting1− γn = 0.95 in (4); (iii) Shape Restricted 2SLS estimate; (iv) Confidence region usingshape restrictions; (v)-(vi) Unrestricted 2SLS estimate and Wald confidence region.
In Table 5 we impose the conjectured shape restrictions while building 95%-level
confidence intervals for the marginal return to education at the median level of education.
By means of comparison, we also report the Wald confidence regions based on the
unrestricted two stage least squares (2SLS) estimator. Overall, we find little sensitivity
of our results to the choice of rn. In unreported results we found the confidence regions
corresponding to rn = +∞ to equal those obtained by setting 1 − γn = 0.95. We
further find imposing shape restrictions considerably sharpens the confidence regions
with the impact on the analysis of the 1930-1939 cohort being particularly salient. With
the exception of the 1920-1929 cohort, we also find imposing both monotonicity and
concavity is more informative than only imposing monotonicity.
34
References
Ai, C. and Chen, X. (2003). Efficient estimation of models with conditional moment restrictionscontaining unknown functions. Econometrica, 71 1795–1844.
Ai, C. and Chen, X. (2007). Estimation of possibly misspecified semiparametric conditionalmoment restriction models with different conditioning variables. Journal of Econometrics,141 5–43.
Ai, C. and Chen, X. (2012). The semiparametric efficiency bound for models of sequentialmoment restrictions containing unknown functions. Journal of Econometrics, 170 442–457.
Aıt-Sahalia, Y. and Duarte, J. (2003). Nonparametric option pricing under shape restric-tions. Journal of Econometrics, 116 9–47.
Andrews, D. W. (2000). Inconsistency of the bootstrap when a parameter is on the boundaryof the parameter space. Econometrica, 68 399–405.
Andrews, D. W. (2001). Testing when a parameter is on the boundary of the maintainedhypothesis. Econometrica 683–734.
Andrews, D. W. K. and Shi, X. (2013). Inference based on conditional moment inequalities.Econometrica, 81 609–666.
Andrews, D. W. K. and Soares, G. (2010). Inference for parameters defined by momentinequalities using generalized moment selection. Econometrica, 78 119–157.
Angrist, J. D. and Krueger, A. B. (1991). Does compulsory school attendance affect school-ing and earnings? The Quarterly Journal of Economics, 106 979–1014.
Armstrong, T. (2015). Adaptive testing on a regression function at a point. The Annals ofStatistics, 43 2086–2101.
Athey, S. and Stern, S. (1998). An empirical framework for testing theories about compli-mentarity in organizational design. Tech. rep., National Bureau of Economic Research.
Babii, A. and Kumar, R. (2019). Isotonic regression discontinuity designs. Available at SSRN3458127.
Beare, B. K. and Schmidt, L. D. (2014). An empirical test of pricing kernel monotonicity.Journal of Applied Econometrics (in press).
Beresteanu, A. and Molinari, F. (2008). Asymptotic properties for a class of partiallyidentified models. Econometrica, 76 763–814.
Blundell, R., Chen, X. and Kristensen, D. (2007). Semi-nonparametric iv estimation ofshape-invariant engel curves. Econometrica, 75 1613–1669.
Blundell, R., Horowitz, J. L. and Parey, M. (2012). Measuring the price responsivenessof gasoline demand: Economic shapre restrictions and nonparametric demand estimation.Quantitative Economics, 3 29–51.
Bugni, F. A., Canay, I. A. and Shi, X. (2017). Inference for subvectors and other functionsof partially identified parameters in moment inequality models. Quantitative Economics, 81–38.
Canay, I. A. and Shaikh, A. M. (2017). Practical and theoretical advances in inference forpartially identified models. In Advances in Economics and Econometrics: Eleventh WorldCongress, vol. 2. Cambridge University Press Cambridge, 271–306.
35
Chamberlain, G. (1992). Comment: Sequential moment restrictions in panel data. Journal ofBusiness & Economic Statistics, 10 20–26.
Chatterjee, S., Guntuboyina, A. and Sen, B. (2013). Improved risk bounds in isotonicregression. arXiv preprint arXiv:1311.3765.
Chen, X. (2007). Large sample sieve estimation of semi-nonparametric models. In Handbook ofEconometrics 6B (J. J. Heckman and E. E. Leamer, eds.). North Holland, Elsevier.
Chen, X. and Pouzo, D. (2009). Efficient estimation of semiparametric conditional momentmodels with possibly nonsmooth residuals. Journal of Econometrics, 152 46–60.
Chen, X. and Pouzo, D. (2012). Estimation of nonparametric conditional moment modelswith possibly nonsmooth generalized residuals. Econometrica, 80 277–321.
Chen, X. and Pouzo, D. (2015). Sieve wald and qlr inferences on semi/nonparametric condi-tional moment models. Econometrica, 83 1013–1079.
Chen, X., Pouzo, D. and Tamer, E. (2011a). Qlr inference on partially identified nonpara-metric conditional moment models. Working paper, Yale University.
Chen, X. and Santos, A. (2018). Overidentification in regular models. Econometrica, 861771–1817.
Chen, X., Tamer, E. and Torgovitsky, A. (2011b). Sensitivity analysis in semiparametriclikelihood models. Working paper, Yale University.
Chernozhukov, V., Chetverikov, D. and Kato, K. (2014). Comparison and anti-concentration bounds for maxima of gaussian random vectors. Probability Theory and RelatedFields, 162 47–70.
Chernozhukov, V., Fernandez-Val, I. and Galichon, A. (2009). Improving point andinterval estimators of monotone functions by rearrangement. Biometrika, 96 559–575.
Chernozhukov, V. and Hansen, C. (2005). An iv model of quantile treatment effects. Econo-metrica, 73 245–261.
Chernozhukov, V., Hong, H. and Tamer, E. (2007). Estimation and confidence regions forparameter sets in econometric models. Econometrica, 75 1243–1284.
Chernozhukov, V., Lee, S. and Rosen, A. M. (2013). Intersection bounds: Estimation andinference. Econometrica, 81 667–737.
Chetverikov, D. (2019). Testing regression monotonicity in econometric models. EconometricTheory, 35 729–776.
Chetverikov, D., Santos, A. and Shaikh, A. M. (2018). The econometrics of shape restric-tions. Annual Review of Economics, 10 31–63.
Chetverikov, D. and Wilhelm, D. (2017). Nonparametric instrumental variable estimationunder monotonicity. Econometrica, 85 1303–1320.
Dette, H., Hoderlein, S. and Neumeyer, N. (2016). Testing multivariate economic restric-tions using quantiles: the example of slutsky negative semidefiniteness. Journal of Economet-rics, 191 129–144.
Eichenbaum, M. S., Hansen, L. P. and Singleton, K. J. (1988). A time series analysisof representative agent models of consumption and leisure choice under uncertainty. TheQuarterly Journal of Economics, 103 51–78.
36
Fang, Z. and Seo, J. (2019). A general framework for inference on shape restrictions. arXivpreprint arXiv:1910.07689.
Freyberger, J. and Horowitz, J. L. (2015). Identification and shape restrictions in non-parametric instrumental variables estimation. Journal of Econometrics, 189 41–53.
Freyberger, J. and Reeves, B. (2018). Inference under shape restrictions. Available at SSRN3011474.
Galichon, A. and Henry, M. (2009). A test of non-identifying restrictions and confidenceregions for partially identified parameters. Journal of Econometrics, 152 186–196.
Gentzkow, M. (2007). Valuing new goods in a model with complementarity: Online newspa-pers. American Economic Review, 97 713–744.
Haag, B. R., Hoderlein, S. and Pendakur, K. (2009). Testing and imposing slutskysymmetry in nonparametric demand systems. Journal of Econometrics, 153 33–50.
Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators.Econometrica, 50 891–916.
Hausman, J. A. and Newey, W. K. (1995). Nonparametric estimation of exact consumerssurplus and deadweight loss. Econometrica 1445–1476.
Hausman, J. A. and Newey, W. K. (2016). Individual heterogeneity and average welfare.Econometrica, 84 1225–1248.
Ho, K. and Rosen, A. M. (2017). Partial identification in applied research: Benefits and chal-lenges. In Advances in Economics and Econometrics: Volume 2: Eleventh World Congress,vol. 2. Cambridge University Press, 307.
Hong, S. (2017). Inference in semiparametric conditional moment models with partial identifi-cation. Journal of econometrics, 196 156–179.
Horowitz, J. L. and Lee, S. (2017). Nonparametric estimation and inference under shaperestrictions. Journal of econometrics, 201 108–126.
Jackwerth, J. C. (2000). Recovering risk aversion from option prices and realized returns.Review of Financial Studies, 13 433–451.
Kaido, H. and Santos, A. (2014). Asymptotically Efficient Estimation of Models Defined byConvex Moment Inequalities. Econometrica, 82 387–413.
Kline, P. and Santos, A. (2012). A score based approach to wild bootstrap inference. Journalof Econometric Methods, 1 23–41.
Koltchinskii, V. I. (1994). Komlos-major-tusnady approximation for the general empiricalprocess and haar expansions of classes of functions. Journal of Theoretical Probability, 773–118.
Kretschmer, T., Miravete, E. J. and Pernıas, J. C. (2012). Competitive pressure andthe adoption of complementary innovations. The American Economic Review, 102 1540.
Ledoux, M. and Talagrand, M. (1988). Un critere sur les petites boules dans le theoremelimite central. Probability Theory and Related Fields, 77 29–47.
Lee, S., Song, K. and Whang, Y.-J. (2018). Testing for a general class of functional inequal-ities. Econometric Theory, 34 1018–1064.
Lewbel, A. (1995). Consistent nonparametric hypothesis tests with an application to slutskysymmetry. Journal of Econometrics, 67 379–401.
37
Linton, O., Song, K. and Whang, Y.-J. (2010). An improved bootstrap test of stochasticdominance. Journal of Econometrics, 154 186–202.
Manski, C. F. (2003). Partial Identification of Probability Distributions. Springer-Verlag, NewYork.
Matzkin, R. L. (1994). Restrictions of economic theory in nonparametric methods. In Handboodof Econometrics (R. Engle and D. McFadden, eds.), vol. IV. Elsevier.
Newey, W. K. and Powell, J. (2003). Instrumental variables estimation of nonparametricmodels. Econometrica, 71 1565–1578.
Reguant, M. (2014). Complementary bidding mechanisms and startup costs in electricitymarkets. The Review of Economic Studies, 81 1708–1742.
Romano, J. P. and Shaikh, A. M. (2012). On the uniform asymptotic validity of subsamplingand the bootstrap. The Annals of Statistics, 40 2798–2822.
Santos, A. (2011). Instrumental variables methods for recovering continuous linear functionals.Journal of Econometrics, 161 129–146.
Santos, A. (2012). Inference in nonparametric instrumental variables with partial identification.Econometrica, 80 213–275.
Sargan, J. D. (1958). The estimation of economic relationships using instrumental variables.Econometrica, 26 393–415.
Tao, J. (2014). Inference for point and partially identified semi-nonparametric conditionalmoment models. Working paper, University of Washington.
Topkis, D. M. (1998). Supermodularity and complementarity. Princeton university press.
Torgovitsky, A. (2019). Nonparametric inference on state dependence in unemployment.Econometrica, 87 1475–1505.
Wolak, F. A. (2007). Quantifying the supply-side benefits from forward contracting in whole-sale electricity markets. Journal of Applied Econometrics, 22 1179–1209.
Yurinskii, V. V. (1977). On the error of the gaussian approximation for convolutions. Theoryof Probability and Its Applications, 2 236–247.
Zhai, A. (2018). A high-dimensional clt in w2 distance with near optimal convergence rate.Probability Theory and Related Fields, 170 821–845.
Zhu, Y. (2019). Inference in non-parametric/semi-parametric moment equality models withshape restrictions. Semi-Parametric Moment Equality Models with Shape Restrictions (Octo-ber 23, 2019).
Table 6: Empirical rejection probabilities for 5%-level tests based on In(R)−In(Θ). Vec-tors pjnn (x) and qknn (z) constructed from piecewise quadratic continuously differentiablesplines. Value of rn set to a percentile corresponds to choice of 1− γn in (42).
2
S.3 Rate of Convergence
Proof of Theorem 3.1: For conciseness, define ηn ≡ k1/pn
√log(1 + kn)JnBn/
√n and
let δ−1n ≡ νn(ηn + τn). In addition, we define the event An ≡ An1 ∩An2 ∩An3 where
Lemma S.3.3. Let gjJj=1 be a finite set of functions satisfying supP∈P ‖gj‖P,∞ ≤ C
for all 1 ≤ j ≤ J and some C <∞. Defining the class of functions
Gn ≡ fgj : f ∈ Fn, 1 ≤ j ≤ J,
it then follows that N[ ](ε,Gn, ‖ · ‖P,2) ≤ J ×N[ ](ε/C,Fn, ‖ · ‖P,2) for all P ∈ P.
Proof: First define g+j ≡ gj ∨ 0 and g−j ≡ gj ∧ 0, where ∨ and ∧ denote the pointwise
maximums and minimums. If [fi,l,P , fi,u,P ]i is a collection of brackets for Fn satisfying∫(fi,u,P − fi,l,P )2dP ≤ ε2 (S.28)
for all i, then it follows that the following collection of brackets covers the class Gn
[g+j fi,l,P + g−j fi,u,P , g
−j fi,l,P + g+
j fi,u,P ]i,j . (S.29)
Moreover, since |gj | = g+j − g
−j by construction, we also obtain from (S.28) that
∫(g+j fi,u,P + g−j fi,l,P − g
+j fi,l,P − g
−j fi,u,P )2dP
=
∫(fi,u,P − fi,l,P )2|gj |2dP ≤ ε2C2. (S.30)
Since there are J ×N[ ](ε,Fn, ‖ · ‖P,2) brackets in (S.29), we conclude from (S.30) that
N[ ](ε,Gn, ‖ · ‖P,2) ≤ J ×N[ ](ε
C,Fn, ‖ · ‖P,2),
for all P ∈ P, which establishes the claim of the lemma.
Lemma S.3.4. If Assumption 3.4 holds, then there exists a constant B <∞ such that
lim infn→∞
infP∈P
P (Σ−1n exists and max‖Σn‖o,p, ‖Σ−1
n ‖o,p < B) = 1.
Proof: First note that by Assumption 3.4(iii) there exists a B <∞ such that
supn≥1
supP∈P
max‖Σn(P )‖o,p, ‖Σn(P )−1‖o,p <B
2. (S.31)
8
Next, let In denote the kn × kn identity matrix and for each P ∈ P rewrite Σn as
Σn = Σn(P )In − Σn(P )−1(Σn(P )− Σn). (S.32)
By Theorem 2.9 in Kress (1999), the matrix In−Σn(P )−1(Σn(P )−Σn) is invertible and
the operator norm of its inverse is bounded by two when ‖Σn(P )−1(Σn(P )− Σn)‖o,p <1/2. Since by Assumption 3.4(ii) and the equality in (S.32) it follows that Σn is invertible
if and only if In − Σn(P )−1(Σn(P )− Σn) is invertible, we obtain that
P (Σ−1n exists and ‖In − Σn(P )−1(Σn(P )− Σn)−1‖o,p < 2)
≥ P (‖Σn(P )−1(Σn − Σn(P ))‖o,p <1
2) ≥ P (‖Σn − Σn(P )‖o,p <
1
B), (S.33)
where we employed ‖Σn(P )−1(Σn − Σn(P ))‖o,p ≤ ‖Σn(P )−1‖o,p‖Σn − Σn(P )‖o,p and
(S.31). Hence, since ‖Σn(P )‖o,p < B/2 for all P ∈ P and n, (S.32) and (S.33) yield
P (Σ−1n exists and ‖Σ−1
n ‖o,p < B) ≥ P (‖Σn − Σn(P )‖o,p <1
B). (S.34)
Finally, since ‖Σn‖o,p ≤ B/2 + ‖Σn − Σn(P )‖o,p by (S.31), result (S.34) implies that
lim infn→∞
infP∈P
P (Σ−1n exists and max‖Σn‖o,p, ‖Σ−1
n ‖o,p < B)
≥ lim infn→∞
infP∈P
P (‖Σn − Σn(P )‖o,p < minB2,
1
B) = 1,
where the equality, and hence the lemma, follows from Assumption 3.4(i).
Lemma S.3.5. If a ∈ Rd, then ‖a‖p ≤ d( 1p− 1p
)+‖a‖p for any p, p ∈ [2,∞].
Proof: The case p ≤ p trivially follows from ‖a‖p ≤ ‖a‖p for all a ∈ Rd. For the case
p > p, let a = (a(1), . . . , a(d))′ and note that by Holder’s inequality we obtain
‖a‖pp =
d∑i=1
|a(i)|p× 1 ≤ d∑i=1
(|a(i)|p)pp
pp
d∑i=1
1pp−p 1−
pp =
d∑i=1
|a(i)|pppd
1− pp . (S.35)
Thus, the claim of the lemma for p > p follows from taking the 1/p power in (S.35).
S.4 Strong Approximation
Proof of Theorem 3.2: Let Pf ≡ EP [f(V )] and note by Assumption 3.4(iii) there
is a C0 <∞ such that ‖Σn(P )‖o,p ≤ C0 for all P ∈ P0. Therefore, Assumption 3.10(ii)
9
and Lemma S.3.5 imply that for all P ∈ P0, θ0 ∈ Θr0n(P ), and h/
Lemma S.4.3. Let Assumptions 3.3(i), 3.3(iii), 3.4, and 3.10(ii)-(iii) hold. For any
12
positive sequence δn it then follows that uniformly in P ∈ P0 we have
infθ0∈Θr
0n(P )inf
h√n∈Vn(θ0,δn)
‖Wn,Pρ(·, θ0) ∗ qknn +√nEP [ρ(X, θ0 +
h√n
) ∗ qknn (Z)]‖Σn(P ),p
= infθ0∈Θr
0n(P )inf
h√n∈Vn(θ0,δn)
‖Wn,Pρ(·, θ0)∗qknn +√nEP [ρ(X, θ0+
h√n
)∗qknn (Z)]‖Σn,p+oP (an).
Proof: First note that by Assumption 3.4(iii) there exists a constant C0 < ∞ such
that max‖Σn(P )‖o,p, ‖Σn(P )−1‖o,p ≤ C0 for all P ∈ P. Therefore, since ‖Σna‖p ≤‖ΣnΣn(P )−1‖o,p‖Σn(P )a‖p for any a ∈ Rkn , and ‖ΣnΣn(P )−1‖o,p ≤ ‖Σn(P )−1‖o,p‖Σn−Σn(P )‖o,p + 1 by the triangle inequality, it follows that
In particular, note Assumption S.5.1 holds with Dn(L,E) = Sn(L,E), Dn(B,E) =
Sn(B,E), and θ0 = θ0. Hence, Assumption 3.16 implies Assumptions S.5.1 and S.5.2.
In general, however, Dn(L,E) and Dn(B,E) can be smaller than Sn(L,E) and Sn(B,E)
while the introduction of a θ0 6= θ0 eases requirements in partially identified models.
Our next theorem consists of two parts. The first part, which replaces Assumption
3.16 with S.5.1 and S.5.2, can by the preceding discussion be seen as a generalization of
Theorem 3.3. The second part shows that, under additional restrictions, it is possible
to replace the norm ‖ · ‖B in the definition of Vn(θ, `) (as in (22)) with the norm ‖ · ‖E– an observation that is sometimes helpful in easing rate restrictions.
. ‖Σn(P )‖o,p ×KmDn(L,E)δn`n√n+ oP (an) = oP (an) (S.68)
where the second inequality follows from ‖hn/√n‖B ≤ `n due to hn/
√n ∈ Vn(θn, `n),
Assumption 3.15(i), and result (S.66). In turn, the final result in (S.68) is implied
by (S.63) and ‖Σn(P )‖o,p being uniformly bounded due to Assumption 3.4(iii). Next,
we note that since either ΥG is affine (implying Kg = 0) or δnDn(B,E) = o(1), and
in addition we have δnDn(B,E) = o(rn) and lim supn→∞ `n/rn1Kg > 0 < 1/2 by
17
Assumption S.5.2(iii), we can conclude that
rn ≥ (MgδnDn(B,E) +Kgδ2nD2
n(B,E)) ∨ 2(`n + δnDn(B,E))1Kg > 0
for n sufficiently large. Hence, Theorem S.7.1(i), Assumption 3.15(ii), and ‖h‖E ≤Kb‖h‖B for all h ∈ Bn and P ∈ P by Assumption 3.15(i), imply that there is an
M <∞ for which with probability tending to one uniformly in P ∈ P0
infh0√n∈Vn(θ0n,2Kb`n)
‖ hn√n− h0√
n‖B ≤M`n(`n + δnDn(B,E))1Kf > 0.
It follows from Assumption S.5.2(ii) and (S.62) that there is a h0n/√n ∈ Vn(θ0n, 2Kb`n)
such that ‖h0n − hn‖B = oP (an) uniformly in P ∈ P0, and hence Assumption 3.4(iii),
Lemma S.5.3, and ‖h‖E ≤ Kb‖h‖B by Assumption 3.15(i) yield
uniformly in P ∈ P0. Next, note that by Assumption 3.4(iii) there exists a constant
C0 <∞ such that ‖Σn(P )−1‖o,p ≤ C0 for all n and P ∈ P. Thus, using that ‖Σna‖p ≤‖ΣnΣn(P )−1‖o,p‖Σn(P )a‖p for any a ∈ Rkn and the triangle inequality we obtain
Hence, the corollary follows from V un (θ, `un) ⊆ V u
n (`un), (S.105), and (S.112).
S.6 Illustrative Examples
We next examine special cases of our general analysis and illustrate both how to imple-
ment our procedure and verify the assumptions in the main text.
S.6.1 Generalized Method of Moments
Our first example concerns the generalized method of moments (GMM) model of Hansen
(1982). For simplicity we assume the parameter of interest θ0(P ) ∈ Θ ⊆ Rdθ is identified
as the unique solution to the unconditional moment restriction
EP [ρ(X, θ0(P ))] = 0, (S.113)
where X ∈ Rdx is distributed according to P ∈ P and ρ : Rdx ×Θ→ RJ . This model
maps into our general analysis by letting Z = 1 for all 1 ≤ ≤ J . Moreover, since we
have assumed θ0(P ) is identified, the hypothesis testing problem simplifies to
H0 : θ0(P ) ∈ R H1 : θ0(P ) /∈ R.
The set R is as in the main text defined by equality and inequality restrictions. In
27
particular, for known functions ΥF : Rdθ → RdF and ΥG : Rdθ → RdG we set
R ≡ θ ∈ Rdθ : ΥF (θ) = 0 and ΥG(θ) ≤ 0. (S.114)
To verify Assumption 3.1, note Rd is a Banach space under any norm ‖ · ‖p with 1 ≤p ≤ ∞, so for concreteness we set B = Rdθ , F = RdF , and ‖ · ‖B = ‖ · ‖F = ‖ · ‖2. The
space Rd is in addition a lattice under the standard pointwise partial order
a ≤ b if and only if ai ≤ bi for all 1 ≤ i ≤ d (S.115)
for any (a1, . . . , ad)′ = a and (b1, . . . , bd)
′ = b in Rd. The least upper bound a ∨ b and
smallest lower bound a∧ b are then given by the pointwise maximum and minimum; i.e.
a ∨ b = (maxa1, b1, . . .maxad, bd)′
a ∧ b = (mina1, b1, . . .minad, bd)′.
The vector (1, . . . , 1)′ is an order unit in Rd under the partial order in (S.115). As
discussed in the appendix to the main paper, the order unit induces the norm
inf λ > 0 : |a| ≤ λ(1, . . . , 1)′ = max1≤i≤d
|ai|,
which corresponds to the usual ‖ · ‖∞ norm on Rd. Hence, by setting G = RdG ,
‖ · ‖G = ‖ · ‖∞, and 1G = (1, . . . , 1)′ we verify the requirements of Assumption 3.1.
Since the parameter space Θ is finite dimensional and all moment restrictions are
unconditional, we may set Θn = Θ and kn = J for all n. We base our test statistic on
quadratic forms in the moments (p = 2), which implies Qn(θ) is given by
Qn(θ) ≡ ( 1
n
n∑i=1
ρ(Xi, θ))′Σn(
1
n
n∑i=1
ρ(Xi, θ))12 .
In what follows, we considered tests based on the un-centered statistic In(R) and the
re-centered statistic In(R)− In(Θ). To this end, we impose the following assumptions.
Assumption S.6.1. (i) Xini=1 is an i.i.d. sequence with Xi ∼ P ∈ P; (ii) For each
P ∈ P0 there exists a unique θ0(P ) ∈ Θ solving (S.113); (iii) Θ is convex and compact.
Assumption S.6.2. (i) ρ(x, ·) : Θ → RJ is twice differentiable for all x ∈ Rdx; (ii)
EP [supθ∈Θ ‖ρ(X, θ)‖32], EP [supθ∈Θ ‖∇θρ(X, θ)‖2o,2], and EP [supθ∈Θ ‖∇2θρ(X, θ)‖
1+δo,2 ] are
finite and bounded uniformly in P ∈ P for some δ > 0.
Assumption S.6.3. (i) infP∈P0 infθ∈Θ:‖θ−θ0(P )‖2≥ε ‖EP [ρ(X, θ)]‖2 > 0 for all ε > 0;
(ii) The singular values of EP [∇θρ(X, θ0(P ))] are bounded away from zero in P ∈ P0.
Assumption S.6.4. (i) ‖Σn − Σ(P )‖o,2 = OP (n−1/2) uniformly in P ∈ P; (ii) Σ(P )
is invertible and ‖Σ(P )‖o,2 and ‖Σ(P )−1‖o,2 are bounded uniformly in P ∈ P.
28
In Assumption S.6.2 we focus on differentiable moments for simplicity. Assumption
S.6.3 essentially imposes strong identification of θ0(P ) and hence guarantees that θ0(P )
can be consistently estimated uniformly in P ∈ P0. Finally, Assumption S.6.4 states
the requirements on the weighting matrix Σn. The consistency rate in Assumption S.6.4
is imposed for simplicity since it is easily satisfied in this finite dimensional model.
In what follows, we set the local parameter spaces Vn(θ, `) and V un (θ, `) to equal
Vn(θ, `) = h/√n ∈ Rdθ : θ + h/
√n ∈ Θ ∩R and ‖h/
√n‖2 ≤ `
V un (θ, `) = h/
√n ∈ Rdθ : θ + h/
√n ∈ Θ and ‖h/
√n‖2 ≤ `.
We also denote the random variables to which In(R) and In(Θ) will be coupled to by
Un,P (R|`n) ≡ infh√n∈Vn(θ0(P ),`n)
‖Wn,Pρ(·, θ0(P )) + EP [∇θρ(X, θ0(P ))]h‖Σ(P ),2 (S.116)
Un,P (Θ|`n) ≡ infh√n∈V u
n (θ0(P ),`n)‖Wn,Pρ(·, θ0(P )) + EP [∇θρ(X, θ0(P ))]h‖Σ(P ),2 (S.117)
Our distributional approximations then follow immediately from Theorem 3.2(ii).
Theorem S.6.1. Let Assumptions S.6.1, S.6.2, S.6.3, and S.6.4 hold, ΥF and ΥG
be continuous, and set an =√
log(n)/n1
10+5dθ . Then: For any `n, `un ↓ 0 satisfying
(`n ∨ `un)√
log(1/`n ∨ `un) = o(an) and n−1/2 = o(`n ∨ `un) we have uniformly in P ∈ P0
In(R) = Un,P (R|`n) + oP (an)
In(R)− In(Θ) = Un,P (R|`n)− Un,P (Θ|`un) + oP (an).
The rate of coupling of√
log(n)/n1
10+5dθ obtained in Theorem S.6.1 suffices for both
the empirical process and bootstrap coupling; see Lemmas S.6.3 and S.6.4 below. While
the rate is adequate for our purposes, it can be potentially improved, for example, under
additional moment restrictions. Here, we rely in Yurinskii (1977) both to illustrate the
diversity of coupling arguments that can be employed to verify Assumption 3.7, and to
impose only the weak third moment restriction of Assumption S.6.2(ii).
As in Theorem 3.2, the conclusion in Theorem S.6.1 in fact delivers a family (indexed
by `n) of approximations to the distribution of both In(R) and In(R)−In(Θ). Our next
goal is therefore to obtain bootstrap approximations to the distributional approxima-
tions obtained in Theorem S.6.1. To this end, we write ΥF (θ) = (ΥF,1(θ), . . . ,ΥF,dF (θ))′
and ΥG(θ) = (ΥG,1(θ), . . . ,ΥG,dG(θ))′, and introduce the following assumptions.
Assumption S.6.5. For some ε > 0 and Bε ≡⋃P∈P0
θ : ‖θ− θ0(P )‖2 ≤ ε: (i) Bε ⊆Θ; (ii) ΥF and ΥG are twice differentiable on Bε; (iii) ‖∇ΥF (θ)‖o,2 and ‖∇ΥG(θ)‖o,2 are
29
bounded on Bε; (iv) ‖∇2ΥF,j(θ)‖o,2 is bounded on Bε for 1 ≤ j ≤ dF ; (v) ‖∇2ΥG,j(θ)‖o,2is bounded on Bε for 1 ≤ j ≤ dG; (vi) ∇ΥF (θ) has full row-rank on Bε.
Assumption S.6.6. Either (i) ΥF : Rdθ → RdF is linear, or (ii) There is an ε > 0 and
Kd <∞ such that the singular values of ∇ΥF (θ1)′ are bounded away from zero uniformly
in θ1 ∈ θ : ‖θ−θ1(P )‖2 ≤ ε and P ∈ P0, and there is an h(P ) ∈ N (∇ΥF (θ0(P ))) with
‖h(P )‖2 ≤ Kd satisfying ΥG,j(θ0(P )) +∇ΥG,j(θ0(P ))[h0(P )] ≤ −ε for all 1 ≤ j ≤ dG.
In order to describe the structure of the bootstrap procedure in this application let
Qn(θn) ≤ infθ∈Θ∩R
Qn(θ) + oP (n−1/2)
Qn(θun) ≤ inf
θ∈ΘQn(θ) + oP (n−1/2)
uniformly in P ∈ P; e.g. θn and θun may be set to equal any minimizer of Qn over
Θ∩R and Θ when such minimizers exist. Employing θn and θun we then obtain estima-
tors for the distribution of the random variable Wn,Pρ(·, θ0(P )) and for the derivative
EP [∇θρ(X, θ0(P ))] by evaluating the following expressions at both θ = θn and θ = θun:
Wnρ(·, θ) ≡ 1√n
n∑i=1
ωiρ(Xi, θ)−1
n
n∑j=1
ρ(Xj , θ) (S.118)
Dn(θ) ≡ 1
n
n∑i=1
∇θρ(Xi, θ), (S.119)
where ωini=1 is an i.i.d. sample of standard normal random variables independent
of the the data Xini=1. We note that since moments are assumed differentiable, we
employ an analytical derivative in (S.119) for the resulting computational simplicity.
With regards to the local parameter space, we note that the construction of Vn(θ, `)
requires the bound Kg on the second derivative of ΥG (as specified in Assumption 3.11).
In particular, Assumption S.6.5(v) implies Assumption 3.11 is satisfied with
Kg ≡ max1≤j≤J
supθ∈Bε
‖∇2θΥG,j(θ)‖o,2
(see Lemma S.6.5). If an a-priori bound on the second derivative is not available, then
it is also possible to simply substitute Kg with the data driven choice
Kg ≡ max1≤j≤J
supθ∈Θ:‖θ−θn‖2≤rn
‖∇2θΥG,j(θ)‖o,2.
Given Kg (or sample analogue Kg), we then note the set Gn(θ) in this application equals
Gn(θ) = h√n∈ Rdθ : Υj,G(θ+
h√n
) ≤ maxΥj,G(θ)−Kgrn‖h√n‖2,−rn for 1 ≤ j ≤ dG
30
Furthermore, in this parametric problem we may additionally specify the bandwidth `n
to be infinite. Hence, the sample analogue to the local parameter space is given by
Vn(θ,+∞) = h√n∈ Rdθ :
h√n∈ Gn(θ) and ΥF (θ +
h√n
) = 0.
The approximations to the distributions of In(R) and In(Θ) are then given by the
laws of Un(R|+∞) and Un(Θ|+∞) conditional on the data, where
Un(R|+∞) ≡ infh∈Vn(θn,+∞)
‖Wnρ(·, θn) + Dn(θn)h‖Σn,2
Un(Θ|+∞) ≡ infh∈Rdθ
‖Wnρ(·, θun) + Dn(θu
n)h‖Σn,2.
The validity of these approximations then follows from Theorem 3.3 and Corollary 3.3.
Theorem S.6.2. Let Assumptions S.6.1, S.6.2, S.6.3, S.6.4, S.6.5, and S.6.6 hold,
set an =√
log(n)/n1
10+5dθ , and let n−1/2 = o(rn). Then: for any sequences `n, `un ↓ 0
satisfying (`n ∨ `un)2√
log(1/(`n ∨ `un)) = o(ann− 1
2 ), `n = o(rn), and n−12 = o(`n ∧ `un), it
follows that uniformly in P ∈ P0 we have
Un(R|+∞) ≥ U?n,P (R|`n) + oP (an)
Un(R|+∞)− Un(Θ|+∞) ≥ U?n,P (R|`n)− U?n,P (Θ|`un) + oP (an).
Crucially, note that any sequences `n, `un satisfying the conditions of Theorem S.6.2
also satisfies the conditions of Theorem S.6.1. Therefore, Theorems S.6.2 and S.6.1
together establish the validity of employing the laws of Un(R| +∞) and Un(Θ| +∞)
conditional on the data to approximate the laws of In(R) and In(Θ). In particular, for
a level α test we may compare the test statistic In(R) to the critical value
R as in (S.114). We thus avoid repeating the arguments, and verify only Assumptions
3.11, 3.12, 3.13, 3.14, 3.15, and 3.16 for R = Θ and R as in (S.114).
Next note that Lemma S.6.5 implies Assumptions 3.11, 3.12, and 3.13 are satisfied,
while Lemma S.6.4 verifies Assumption 3.14 with an =√
log(n)/n1
10+5dθ for R = Θ, and
hence also for R as in (S.114). Assumption 3.15(i) is immediate since ‖·‖E = ‖·‖B = ‖·‖2and hence Kb = 1, while Assumption 3.15(ii) is implied by Assumption S.6.5(i) and
‖θn − θ0(P )‖2 = oP (1) uniformly in P ∈ P0. Assumption 3.16(i) is immediate since
Sn(B,E) = 1 and the choices of θn and θun correspond to setting τn = o(n−1/2). Similarly,
it follows form Lemma S.6.2, Sn(L,E) = 1, and n−1/2 = o(`n) that the condition
`2n√
log(`n) = o(ann− 1
2 ) verifies Assumption 3.16(ii). Moreover, since `n = o(rn) and
By the mean value theorem, Θ being convex by Assumption S.6.1(iii), and ‖θ−Πnθ‖2 ≤δn for every θ ∈ Θ due to δn-balls around θkNnk=1 covering Θ, it follows that
supθ∈Θ|(ρ(x, θ)− ρ(x,Πnθ))− EP [(ρ(X, θ)− ρ(X,Πnθ))]|
≤ supθ∈Θ‖∇θρ(x, θ)‖o,2 + sup
P∈PEP [sup
θ∈Θ‖∇θρ(X, θ)‖o,2] × δn.
Hence, setting G(x) ≡ 1 ∨ supθ∈Θ ‖∇θρ(x, θ)‖o,2 + supP∈PEP [supθ∈Θ ‖∇θρ(X, θ)‖o,2]it follows that G(x)δn is an envelope for Gn,P , which by Assumption S.6.2(ii) satisfies
supP∈P ‖Gδn‖P,2 . δn. Further note that if [fl, fu] is a bracket containing a function f ,
then [fl − EP [fu(X)], fu − EP [fl(X)]] contains f − EP [f(X)] and satisfies
‖fu − fl − EP [fl(X)− fu(X)]‖P,2 ≤ 2‖fu − fl‖P,2
by Jensen’s inequality and the triangle inequality. Therefore, Lemma S.6.2 implies
supP∈P
N[ ](ε,Gn,P , ‖ · ‖P,2) . Nn × (1 ∨ ε−dθ),
and hence Theorem 2.14.2 in van der Vaart and Wellner (1996) together with Gn,Phaving envelope δnG with G ≥ 1 and supP∈P ‖G‖P,2 <∞ and Nn . δ−dθn yield
Therefore, the definitions of δn and εn, result (S.133) and Markov’s inequality imply
lim supn→∞
supP∈P
P ( supg∈Gn,P
|Gn,P g| > ηεn) . lim supn→∞
δn(1 + log(δ−dθn ))1/2
ηεn= 0. (S.134)
Similarly, since Wn,P is Gaussian and 0 ∈ Gn,P , Corollary 2.2.8 in van der Vaart and
36
Wellner (1996) and packing numbers being bounded by bracketing numbers imply
supP∈P
EP [ supg∈Gn,P
|Wn,P g|] . supP∈P
∫ ∞0
(logN[ ](ε,Gn,P , ‖ · ‖P,2))12dε
. supP∈P
∫ 2δn‖G‖P,2
0(logN[ ](ε,Gn,P , ‖ · ‖P,2))
12dε . δn(1 + log(δ−dθn ))1/2, (S.135)
where in the second inequality we employed that the bracket [−δnG, δnG] covers Gn,Pdue to δnG being an envelope for Gn,P , and the final inequality follows from the change
of variables u = ε/(2δn‖G‖P,2) and the same manipulations as in (S.133). Hence,
lim supn→∞
supP∈P
P ( supg∈Gn,P
|Wn,P g| > ηεn) . lim supn→∞
δn(1 + log(δ−dθn ))1/2
ηεn= 0, (S.136)
by result (S.135) and Markov’s inequality. To conclude, note that the definitions of
Wn,P in (S.130) and (S.131), and of Gn,P in (S.132), and the triangle inequality yield
supθ∈Θ‖Gn,Pρ(·, θ)−Wn,Pρ(·, θ)‖2
≤ ‖Gn,P rn,P − Nn,P ‖2 + supg∈Gn,P
√J |Gn,P g|+ sup
g∈Gn,P
√J |Wn,P g|.
Thus the Lemma follows from (S.129), (S.134), and (S.136).
Lemma S.6.4. If Assumptions S.6.1(i)(iii) and S.6.2 hold, then it follows that As-
sumption 3.14 is satisfied with R = Θ and an = log3/4(n)/n1
12+2dθ .
Proof: We aim to establish the lemma by relying on Theorem S.9.1(i) in Section S.9.
To this end set ζn = n− 1
2(6+dθ) , Mn = n1
6+dθ , and Nn ≡ N(ζn,Θ, ‖ · ‖2). By Assumption
S.6.2(ii) the function F (x) ≡ supθ∈Θ ‖ρ(x, θ)‖2 is integrable, and for any θ ∈ Θ we let
verifies Assumption 3.11(iii). By identical arguments, but recalling F = RdF and ‖·‖F =
‖ · ‖2, it follows Assumptions S.6.5(iii)-(iv) imply Assumptions 3.12(i)-(iii) hold with
Kf ≡√dF sup
θ∈Bεmax
1≤j≤dF‖∇2ΥF,j(θ)‖o,2 Mf ≡ sup
θ∈Bε‖∇ΥF (θ)‖o,2. (S.142)
To conclude, note that since Assumption S.6.5(vi) implies the range of ∇ΥF (θ)
equals RdF for all θ ∈ Bε, it follows that ∇ΥF (θ) admits a right inverse. Moreover, if
ΥF is linear, then Kf = 0 and hence Assumption 3.12(iv) holds. On the other hand,
if ΥF is nonlinear, then note ∇ΥF (θ)− = ∇ΥF (θ)′(∇ΥF (θ)∇ΥF (θ)′)−1 and therefore
‖∇ΥF (θ)−‖o,2 is bounded for all θ ∈ Bε due to ‖∇ΥF (θ)‖o,2 being bounded on Bε by
39
Assumption S.6.5(ii), and the smallest singular value of ∇ΥF (θ)′ being bounded away
from zero on Bε by Assumption S.6.6(ii). It thus follows Assumption 3.12(iv) holds
as well (after possibly increasing Mf in (S.142)). Finally, we note Assumption S.6.6
directly implies Assumption 3.13 and hence the Lemma follows.
Lemma S.6.6. Let Assumptions S.6.1, S.6.2, S.6.3, and S.6.4 hold, and set an =√log(n)/n
110+5dθ . For any `n with n−
12 = o(`n), it follows uniformly in P ∈ P0 that
Un(R|+∞) = infh√n∈Vn(θn,`n)
‖Wnρ(·, θn) + Dn(θn)h‖Σn,2 + oP (an)
Un(Θ|+∞) = inf‖ h√
n‖2≤`n
‖Wnρ(·, θun) + Dn(θu
n)h‖Σn,2 + oP (an).
Proof: We establish the lemma by relying on Lemma 3.1. To this end note that in
the proof of Theorem S.6.1, Assumptions 3.2, 3.3(i)(iii), and 3.4 were verified and θn
and θun were shown to be consistent for θ0(P ) with ‖ · ‖B = ‖ · ‖E = ‖ · ‖2, Rn = n−1/2,
and νn 1 for both R as in (S.114) and for R = Θ. Next, note Lemma S.6.4 verifies
Assumption 3.14 holds with an =√
log(n)/n1
10+5dθ for R = Θ and hence also for R as
in (S.114). Also observe that the mean value theorem and Θ being convex imply
| ∂∂θk
ρ(x, θ1)− ∂
∂θkρ(x, θ2)| ≤ max
1≤≤Jsupθ∈Θ‖∇2
θρ(x, θ)‖o,2‖θ1 − θ2‖2 (S.143)
for any θ1, θ2 ∈ Θ, 1 ≤ ≤ J , and 1 ≤ k ≤ dθ. Thus, Assumption S.6.2(ii) implies there
exists a C0 <∞ such that for all P ∈ P and θ1, θ2 ∈ Θ it follows that
‖EP [∇θρ(X, θ1)−∇θρ(X, θ2)]‖o,2 ≤ C0‖θ1 − θ2‖2.
In particular, the function θ 7→ EP [∇θρ(X, θ)] is uniformly continuous in θ and P ∈ P,
which implies by Assumption S.6.3(ii) that there is an ε0 > 0 such that the smallest
singular value of EP [∇θρ(X, θ)] is bounded away from zero on θ ∈ Θ : ‖θ − θ0(P )‖2 ≤ε0 for some P ∈ P0. Since νn 1, p = 2, and ‖·‖E = ‖·‖2, the Lemma 3.1 requirement
‖h‖E ≤ νn‖Dn,P (θ)[h]‖p for all θ ∈ An(P ), P ∈ P0, and h ∈√nBn∩R−θ holds with
An(P ) = (Θr0n(P ))ε0 and R = Θ (and hence also for R as in (S.114)). Moreover, by
uniform consistency of θn and θun it follows that θn, θ
un ∈ An(P ) with probability tending
to one uniformly in P ∈ P0.
To conclude, define F ≡ ∂∂θk
ρ(·, θ) : for some θ ∈ Θ, 1 ≤ ≤ J , 1 ≤ k ≤ dθ and
let F (x) ≡ max1≤≤J supθ∈Θ ‖∇2θρ(x, θ)‖o,2. Then note that if ε-balls around θiNεi=1
cover Θ, then result (S.143) implies that the brackets [ ∂∂θk
ρ(·, θi)− εF, ∂∂θk
ρ(·, θi) + εF ]
cover F . Writing these brackets as [fl,k, fu,k]Kεk=1 for conciseness, further note that
Kε = J dθNε <∞ since Nε <∞ due to Θ being compact, and C1 ≡ supP∈P ‖F‖P,1 <∞
40
by Assumption S.6.2(ii). Moreover, by definition of [fl,k, fu,k] it further follows that
EP [fu,k(X)− fl,k(X)] = ‖fu,k − fl,k‖P,1 ≤ 2εC1 (S.144)
for all P ∈ P. Hence, employing the bound f(x) − EP [f(X)] ≤ fu,k(x) − EP [fl,k(X)]
for [fl,k, fu,k] the bracket containing f , we obtain from result (S.144) that
supf∈F 1
n
n∑i=1
f(Xi)− EP [f(X)]
≤ max1≤k≤Kε
| 1n
n∑i=1
fu,k(Xi)− EP [fu,k(X)]|+ 2εC1 = 2εC1 + oP (1), (S.145)
where the equality holds uniformly in P ∈ P by Assumption S.6.2(ii), Kε < ∞, and
Theorem 2.8.1 in van der Vaart and Wellner (1996). By identical arguments, we have
inff∈F 1
n
n∑i=1
f(Xi)− EP [f(X)]
≥ − max1≤k≤Kε
| 1n
n∑i=1
fl,k(Xi)− EP [fl,k(X)]| − 2εC1 = −2εC1 + oP (1), (S.146)
uniformly in P ∈ P. We thus conclude from results (S.145) and (S.146) that F is
Glivenko-Cantelli uniformly in P ∈ P. Since by Assumption S.6.5(i) there exists an
ε > 0 such that θ : ‖θ − θ0(P )‖2 ≤ ε ⊆ Θ for all P ∈ P0, we obtain
supθ:‖θ−θ0(P )‖2≤ε
suph∈Rdθ :‖ h√
n‖2≥`n
‖Dn(θ)[h]− Dn,P (θ)[h]‖2‖h‖2
≤ supθ∈Θ‖ 1
n
n∑i=1
∇θρ(Xi, θ)− EP [∇θρ(X, θ)]‖o,2 = oP (1) (S.147)
uniformly in P ∈ P, and where the equality follows from F being Glivenko-Cantelli
uniformly in P ∈ P. Since νn 1, result (S.147) verifies condition (24) in Lemma 3.1.
This concludes verifying the requirements of Lemma 3.1 and hence the present Lemma
follows for any `n satisfying Sn(B,E)Rn = o(`n), which in this application is equivalent
to n−1/2 = o(`n) due to ‖ · ‖B = ‖ · ‖E = ‖ · ‖2 and Rn n−1/2.
S.6.2 Consumer Demand
We base our next example on a long-standing literature aiming to replace paramet-
ric assumptions with shape restrictions implied by economic theory (Matzkin, 1994).
41
Specifically, suppose that quantity demanded by individual i, denoted Qi, satisfies
Qi = gP (Si, Yi) +W ′iγP + Ui
where Si ∈ R+ denotes price, Yi ∈ R+ denotes income, and Wi ∈ Rdw is a set of
covariates. In addition, we assume there is an instrument Zi yielding the restriction
EP [Q− gP (S, Y )−W ′γP |Z] = 0. (S.148)
For instance, under exogeneity of prices we may let Z = (S, Y,W ′)′ as in Blundell et al.
(2012). Alternatively, if there is a concern that prices are endogenous, then we may set
Z = (I, Y,W ′)′ for I an instrument for S, as in Blundell et al. (2017).
Our goal is to conduct inference on the level of demand at particular price income
pair (s0, y0) while imposing that the function gP satisfies the Slutsky restriction
∂
∂sgP (s, y) + gP (s, y)
∂
∂ygP (s, y) ≤ 0. (S.149)
To map this problem into our framework, we assume that for some set Ω, (S, Y ) ∈ Ω ⊆R2
+ with probability one for all P ∈ P and impose that gP ∈ C1B(Ω), where
CmB (Ω) ≡ g : Ω→ R s.t. ‖g‖m,∞ <∞ ‖g‖m,∞ ≡ sup0≤α≤m
sup(s,y)∈Ω
|∇αg(s, y)|.
Since the parameters in this model are (gP , γP ) with γP ∈ Rdw , we set B = C1B(Ω)×Rdw
and equip it with the norm ‖θ‖B = max‖g‖1,∞, ‖γ‖2 for any (g, γ) = θ ∈ B. Note
that in this application, X = (Q,S, Y,W ) ∈ X ≡ R+ × Ω×Rdw and
ρ(X, θ) = Q− g(S, Y )−W ′γ. (S.150)
We further assume that θ0(P ) = (gP , γP ) is identified as the unique solution to (S.148).
In order to impose the Slutsky restriction in (S.149) we let G = C0B(Ω) and ‖ · ‖G =
‖ · ‖∞, where with some abuse of notation we write ‖ · ‖∞ in place of ‖ · ‖0,∞. The space
C0B(Ω) is a Banach lattice under the standard pointwise ordering given by
a ≤ b if and only if a(s, y) ≤ b(s, y) for all (s, y) ∈ Ω. (S.151)
Moreover, the constant function c ∈ C0B(Ω) satisfying c(s, y) = 1 for all (s, y) ∈ Ω is an
order unit under the partial ordering in (S.151). Its induced norm on C0B(Ω) equals
inf λ > 0 : |a| ≤ λc = sup(s,y)∈Ω
|a(s, y)|,
which coincides with the norm ‖·‖∞ on C0B(Ω), and we therefore set 1G = c. To encode
42
the Slutsky restriction in (S.149) we then let the map ΥG : B→ G equal
ΥG(θ)(s, y) =∂
∂sg(s, y) + g(s, y)
∂
∂yg(s, y) (S.152)
for any θ = (g, γ) ∈ B. Finally, to test whether the level of demand at a prescribed
price s0 and income y0 equals a hypothesized value c0, we set F = R, ‖ · ‖F = | · |, and
ΥF (θ) = g(s0, y0)− c0 (S.153)
for any θ = (g, γ) ∈ B. By setting R = θ ∈ B : ΥG(θ) ≤ 0 and ΥF (θ) = 0 and
conducting test inversion (over different values of c0) of the null hypothesis
H0 : θ0(P ) ∈ R H1 : θ0(P ) /∈ R
we may then obtain a confidence region for the level of demand at price s0 and income
y0 that imposes the Slutsky restriction in (S.149).
We set the parameter space to be a ball in B under ‖ · ‖B by setting Θ to be
Similarly, for a sequence qn,kknk=1 of transformations of the conditioning variable Z, we
let qknn (z) ≡ (q1,n(z), . . . , qkn,n(z))′. We base our test statistic on the quadratic forms
Qn(θ) ≡ ( 1
n
n∑i=1
(Qi−g(Si, Yi)−W ′iγ)qknn (Zi))′Σn(
1
n
n∑i=1
(Qi−g(Si, Yi)−W ′iγ)qknn (Zi))12
for some weighting matrix Σn and every (g, γ) = θ ∈ Θ. The constrained (i.e. In(R))
and unconstrained (i.e. In(Θ)) statistics are then simply given by the minimum of√nQn
over θ ∈ Θn ∩R and Θn respectively.
The next assumptions suffice for obtaining a strong approximation. In their state-
ment, the notation singA denotes the smallest singular value of a matrix A.
Assumption S.6.7. (i) Xi, Zini=1 is i.i.d. with (X,Z) distributed according to P ∈ P;
(ii) For Θ as in (S.154) and each P ∈ P0 there exists a unique θ0(P ) ∈ Θ satisfying
(S.148); (iii) The support of (Q,W ) is bounded uniformly in P ∈ P.
Assumption S.6.8. (i) sup(s,y) ‖pjnn (s, y)‖2 .
√jn; (ii) sup(s,y) ‖∂ap
jnn (s, y)‖2 . j
3/2n
for a ∈ s, y; (iii) The eigenvalues of EP [pjnn (S, Y )pjnn (S, Y )′] are bounded away from
43
zero and infinity uniformly in P ∈ P and n; (iv) For each P ∈ P0 there is a Πnθ0(P ) =
(gn, γ) ∈ Θn ∩R with supP∈P0‖EP [(gP (S, Y )− gn(S, Y ))qknn (Z)‖2 = O((n log(n))−1/2).
Assumption S.6.9. (i) max1≤k≤kn ‖qk,n‖∞ .√kn; (ii) The largest eigenvalues of
EP [qknn (Z)qknn (Z)′] are bounded uniformly in P ∈ P and n; (iii) For every n, sn ≡infP∈P singEP [qknn (Z)(pjnn (S, Y )′ W ′)] is positive and sn = O(1); (iv) j3
nk3n log3(n) =
o(n) and k2njn log
32 (1 + kn)/(sn
√n)(1 ∨
√log(sn
√n/kn)) = o((log(n))−1/2).
Assumption S.6.10. (i) ‖Σn−Σn(P )‖o,2 = OP ((kn√jn log(n))−1) uniformly in P ∈ P;
(ii) Σn(P ) is invertible and ‖Σn(P )‖o,2 and ‖Σn(P )−1‖o,2 are bounded in P ∈ P and n.
Assumption S.6.7(iii) requires (Q,W ) to be bounded, which enables to apply the
recent coupling results by Zhai (2018). Alternatively, Assumption S.6.7(iii) can be re-
laxed under appropriate tail conditions; see, e.g., the proof of Lemma S.6.4 for related
arguments. Assumptions S.6.8(i)-(iii) are standard requirements on Θn that are sat-
isfied, e.g., by tensor product wavelets or B-splines under appropriate conditions on
the distribution of (S, Y ) (Newey, 1997; Chen, 2007; Belloni et al., 2015; Chen and
Christensen, 2018). Assumption S.6.8(iv) pertains the approximating requirements on
the sieve; see Remarks S.6.1 and S.6.2 below. Assumption S.6.8(iv) can be relaxed for
studying the re-centered statistic, but we impose it here for simplicity to analyse both
In(R) and In(R) − In(Θ). In turn, Assumption S.6.9(i)(ii) imposes standard require-
ments on the transformations qk,nknk=1. Assumption S.6.9(iii)(iv) contains the required
rate conditions, which are governed by sn – a parameter that is proportional to ν−1n (as
in Assumption 3.6) and is closely linked the degree of ill-posedness; see Remark S.6.2
below. Finally Assumption S.6.10 states the conditions on the weighting matrix Σn.
In this application, we may set ‖θ‖E = supP∈P ‖g‖P,2 + ‖γ‖2 for any (g, γ) ∈ Θ.
Since in addition any θ = (g, γ) ∈ Θn ∩R has the structure g = pjn′n β, we have
Vn(θ, `) =
(pjn′n βh, γh) : ‖g +pjn′n βh√
n‖1,∞ ≤ C0 and ‖γ +
γh√n‖2 ≤ C0 (S.155)
pjnn (s0, y0)′βh = 0 (S.156)
∂
∂s(g +
pjn′n βh√n
) + (g +pjn′n βh√
n)∂
∂y(g +
pjn′n βh√n
) ≤ 0 (S.157)
supP∈P‖pjn′n βh‖P,2 + ‖γh‖2 ≤ `
√n, (S.158)
where constraint (S.155) corresponds to (θ + h/√n) ∈ Θn, constraints (S.156) and
Assumptions S.6.11(i)(ii) suffice for verifying that Assumption 3.15(ii) is satisfied.
These requirements may dropped at the expense of modifying the estimators of the local
parameter space to reflect the possible impact of Πnθ0(P ) being “near” the boundary of
Θn. In turn, Assumption S.6.11 imposes the rate conditions on rn. Finally, Assumption
S.6.11(iv) controls the “size” of the set of coefficients β corresponding to elements pjn′n β ∈Θn and suffices for verifying the bootstrap coupling requirement of Assumption 3.14. For
instance, for tensor product B-splines, En j1/4n (see Lemma S.6.14), which implies a
sufficient condition for Assumption S.6.11(iv) is that k4nj
4n log4(kn) = o(n/ log2(n)). The
rate requirements for a bootstrap coupling can be weakened, for instance, if the test
statistic is based on the ‖ · ‖∞ of the moments (see Lemma S.6.10) or under additional
smoothness assumptions (see Theorem S.9.1(ii)).
Our next result characterizes the properties of the proposed bootstrap statistics.
Theorem S.6.4. Let Assumptions S.6.7, S.6.8, S.6.9, S.6.10, S.6.11 hold, and an =
(log(n))−1/2. Then: for any `n, ˜un ↓ 0 satisfying knj
2n log(1 + kn)/sn
√n = o(`n ∧ `un),
46
`n = o(rn), and kn√jn log(1 + kn)(`n ∨ `un)
√log(√jn/(`n ∨ `un)) = o(an) it follows that
uniformly in P ∈ P0 we have
Un(R|+∞) ≥ U?n,P (R|`n) + oP (an)
Un(R|+∞)− Un(Θ|+∞) ≥ U?n,P (R|`n)− U?n,P (Θ|`un) + oP (an).
We note that the existence of sequence `n, `un satisfying the requirements of the
theorem is implied by Assumptions S.6.9(iv) and S.6.11(iii). Importantly, any sequences
`n, `un satisfying the requirements of Theorem S.6.4 also satisfies the requirements of
Theorem S.6.3. As a result, together Theorems S.6.3 and S.6.4 justify employing
log(n) for R = Θ and R as corresponding to (S.152) and (S.153). We
thus only verify Assumptions 3.11, 3.12, 3.13, 3.14, 3.15, and 3.16 for R = Θ and R as
corresponding to (S.152) and (S.153).
Next note that Lemma S.6.11 implies Assumptions 3.11, 3.12, and 3.13 are satisfied
with Kg = 2 and Kf = 0, while Lemma S.6.10 and Assumption S.6.11(iv) imply As-
sumption 3.14 holds with R = Θ (and hence for R corresponding to (S.152) and (S.153))
with an = (log(n))−12 . Further note that since supP∈P ‖g‖P,2 + ‖γ‖2 ≤ 2(‖g‖1,∞ ∨‖γ‖2)
for any (g, γ) ∈ C1B(Ω) × Rdw , it follows that Assumption 3.15(i) holds with Kb = 2.
To verify Assumptions 3.15(ii) and 3.16, note that by the definitions of ‖ · ‖B and ‖ · ‖Ein this application and the eigenvalues of EP [pjnn (S, Y )pjnn (S, Y )′] being bounded away
from zero uniformly in P ∈ P by Assumption S.6.8(iii) we obtain
Sn(B,E) = sup(β,γ)
‖pjn′n β‖1,∞ ∨ ‖γ‖2supP∈P ‖p
jn′n β‖P,2 + ‖γ‖2
≤ 1 ∨ supβ
‖pjn′n β‖1,∞supP∈P ‖p
jn′n β‖P,2
. 1 ∨ supβ
‖pjn′n β‖1,∞‖β‖2
. j3/2n (S.164)
where the final equality follows from Assumptions S.6.8(i)(ii). Thus, since setting θn
and θun to be the minimizers of Qn corresponds to setting τn = 0, Assumption 3.16(i)
holds provided RnSn(B,E) = o(1), which is satisfied by Assumption S.6.9(iv) and
jn ≤ kn due to Assumption S.6.9(iii). Furthermore, also note that RnSn(B,E) = o(1)
together with Assumption S.6.11(i)(ii) and setting Πunθ0(P ) = Πr
nθ0(P ) = Πnθ0(P )
for all P ∈ P0 imply Assumption 3.15(ii) is satisfied as well. Moreover, since Bn √kn and Kf = Km = 0, Lemma S.6.8 implies Assumption 3.16(ii) holds for any `n
satisfying kn√jn log(1 + kn)`n(1 +
√log(√jn/`n)) = o(an). Similarly, we obtain that
Assumption 3.16(iii) is satisfied provided `n = o(rn) (imposed in the Theorem) and
knj2n log(1 + kn)/sn
√n = o(rn) (implied by Assumption S.6.11(iii)). We have thus
verified the conditions of Theorem 3.3 and Corollary 3.3, which together establish the
claim of the Theorem.
Lemma S.6.7. If Assumptions S.6.7(ii)(iii), S.6.8, S.6.9(i)(iii)(iv), and S.6.10 hold,
then Assumptions 3.5 and 3.6 are satisfied with both R = Θ and R corresponding to
where the second inequality holds due to ‖Σn(P )−1‖o,2 being bounded uniformly in n
and P ∈ P by Assumption S.6.10(ii), while the final inequality follows from (S.165) and
the triangle inequality. Thus, we conclude form (S.166) that Assumption 3.6(i) holds
with ν−1n sn and Vn(P ) = Θn ∩ R. Finally, note Assumption 3.6(ii) holds by result
(S.165) and Assumption S.6.10(i) and hence the lemma follows.
Lemma S.6.8. Define the class Fn ≡ f : f(v) = (q − g(s, y)− w′γ) for some (g, γ) ∈Θn and suppose that Assumptions S.6.7(iii) and S.6.8(i)(iii) hold. Then, it follows
that supP∈PN[ ](ε,Fn, ‖ · ‖P,2) . (1∨ (√jnK/ε)
jn+dw) for some K <∞, and in addition
supP∈P J[ ](ε,Fn, ‖ · ‖P,2) . ε√jn(1 +
√log(1 ∨ (
√jn/ε))).
Proof: Define the classes F1n ≡ f : f(v) = q − w′γ with ‖γ‖2 ≤ C0 and F2n ≡pjn′n β : ‖pjn′n β‖1,∞ ≤ C0, and then note that by definition of Fn we have
supP∈P
N[ ](ε,Fn, ‖ · ‖P,2) ≤ supP∈P
N[ ](ε
2,F1n, ‖ · ‖P,2)× sup
P∈PN[ ](
ε
2,F2n, ‖ · ‖P,2). (S.167)
Next observe that since the support of W is bounded uniformly in P ∈ P by Assumption
S.6.7(iii), the Cauchy-Schwarz inequality, the covering numbers of γ ∈ Rdw : ‖γ‖2 ≤
51
C0 being bounded (up to a multiplicative constant) by 1 ∨ ε−dw , and Theorem 2.7.11
in van der Vaart and Wellner (1996) imply that
supP∈P
N[ ](ε
2,F1n, ‖ · ‖P,2) . 1 ∨ ε−dw . (S.168)
Similarly, for any elements pjn′n β1, pjn′n β2 ∈ F2n, it follows from the Cauchy-Schwarz that
hold, then Assumption 3.7 holds with R = Θ for any an with k3nj
3n log2(n)/n = o(an).
Proof: We establish the claim of the lemma by relying on Lemma S.6.28. To this
end, let jn = (1 + dw) + jn, set rjnn (x) ≡ (q, w′, pjn′n (x))′, and observe any f ∈ Fn can be
written as f = rjn′n δ for some δ ∈ Rjn . Moreover, by Assumption S.6.8(iii) and definition
of Θn, it follows that there exists an M < ∞ such that Fn ⊆ rjn′n δ : ‖δ‖2 ≤ M,while Assumptions S.6.7(iii), S.6.8(i), and S.6.9(i) imply max1≤j≤jn ‖rj‖∞ .
√jn and
max1≤k≤kn ‖qk‖∞ .√kn. The claim of the lemma therefore follows from applying
Lemma S.6.28 with b1n √kn, b2n
√jn, and Cn = M .
Lemma S.6.10. Suppose Assumptions S.6.7(i)(iii), S.6.9(i)(ii), and S.6.8(i)(iii) hold
and let Cn ≡ β ∈ Rjn : ‖pjn′n β‖1,∞ ≤ C0 and En ≡∫∞
0
√log(N(ε, Cn, ‖ · ‖2))dε. If
j2nk
2n log(1 + knjn) = o(n), then it follows that Assumption 3.14 holds with R = Θ for
any sequence an satisfying k1/pn (
√log(kn) + En)j
3/4n k
1/2n log1/4(1 + jnkn)/n1/4 = o(an).
Proof: Recall that in this application X ≡ (Q,S, Y,W ′)′ and, when R = Θ, we
52
have Fn ≡ f : f(x) = (q − g(s, y) − w′γ) for some (g, γ) ∈ Θn. Therefore, letting
Fn ≡ fqk,n : f ∈ Fn and 1 ≤ k ≤ kn we obtain by direct calculation that
supf∈Fn
‖Wnfqknn −W?
nfqknn ‖p ≤ k1/p
n supf∈Fn
|Wnf −W?n,P f |. (S.170)
We establish the claim of the lemma by relying on (S.170) and applying Theorem S.9.1
to the class Fn. To this end, define dn = kn(jn + dw + 1) and let fdnn (v) ≡ qknn (z) ⊗(pjnn (s, y)′, q, w′)′. Set D1 ≡ (Q,W ′, pjnn (S, Y )′)′ and D2 = qknn (Z), and for eigD1D
′1
the largest eigenvalue of D1D′1, then note that we have
supP∈P‖eigD1D
′1‖P,∞ ≤ sup
P∈P‖traceD1D
′1‖P,∞ . jn, (S.171)
where the final inequality follows from Assumptions S.6.7(iii) and S.6.8(i). Hence, since
eigEP [qknn (Z)qknn (Z)′] is bounded uniformly in P ∈ P by Assumption S.6.9(ii), result
(S.171) and Lemma S.6.13 imply Assumption S.9.1(i) is satisfied with Cn jn. Similarly,
note that Assumptions S.6.7(iii), S.6.9(i), and S.6.8(i) imply Assumption S.9.1(ii) holds
with Kn =√jnkn. Moreover, Assumption S.9.2(i) is immediate with Gn,P equal to the
zero function and J1n = 0. Finally, note that any f ∈ Fn has the structure
f(v) = qk,n(z)(q − pjnn (s, y)′β − w′γ) for some (pjn′n β, γ) ∈ Θn. (S.172)
Therefore, for Bn as defined in Assumption S.9.2(ii), Cn ≡ β ∈ Rjn : ‖pjn′n β‖1,∞ ≤ C0,and Gn ≡ γ ∈ Rdw : ‖γ‖2 ≤ C0, we can conclude that
N(ε,Bn, ‖ · ‖2) ≤ kn ×N(ε
2,Gn, ‖ · ‖2)×N(
ε
2, Cn, ‖ · ‖2)
. kn × ((C0
ε)dw ∨ 1)×N(
ε
2, Cn, ‖ · ‖2), (S.173)
where in the second inequality we employed that N(ε,Gn, ‖ · ‖2) . (C0/ε)dw ∨ 1. Fur-
thermore, note that Assumption S.6.8(iii) implies that ‖β‖2 ‖pjn′n β‖P,2 uniformly in
n and P ∈ P, and hence since ‖pjn′n β‖P,2 ≤ ‖pjn′n β‖1,∞, the definition of Θn and (S.172)
implies that the radius of Bn under ‖ · ‖2 is uniformly bounded in n. Thus, the bound
in (S.173) yields that for some M <∞ we must have∫ ∞0
√log(N(ε,Bn, ‖ · ‖2))dε
.∫ M
0
√log(kn)dε+
∫ C0
0
√log(C0/ε)dε+
∫ M
0
√log(N(ε/2, Cn, ‖ · ‖2))dε
.√
log(kn) +
∫ ∞0
√log(N(u, Cn, ‖ · ‖2))du,
where the final inequality follows from N(ε, Cn, ‖ · ‖2) being (weakly) larger than one
for all ε and the change of variables u = ε/2. Hence, Assumption S.9.2(ii) holds with
53
J2n =√
log(kn) + En, and as a result Theorem S.9.1 implies uniformly in P ∈ P that
‖Wn −W?n,P ‖Fn = OP ((
√log(kn) + En)j
3nk
2n log(1 + jnkn)
n1/4). (S.174)
The claim of the lemma therefore follows from (S.170) and (S.174).
Lemma S.6.11. If B = C1B(Ω) ×Rdw and ΥG, ΥF , and Θ are as defined in (S.152),
(S.153), and (S.154), then it follows that Assumptions 3.11, 3.12, and 3.13 are satisfied
with Kg = 2, Kf = 0, and for any θ = (g, γ) and h = (gh, γh), ∇ΥG(θ)[h] equals
∇ΥG(θ)[h](s, y) =∂
∂sgh(s, y) + g(s, y)
∂
∂ygh(s, y) + gh(s, y)
∂
∂yg(s, y).
Proof: Recall that in this application G = C0B(Ω) and ‖θ‖B = max‖g‖1,∞, ‖γ‖2.
Hence, for any θ1 = (g1, γ1) ∈ B and θ2 = (g2, γ2) ∈ B we obtain that
‖ΥG(θ1)−ΥG(θ2)−∇ΥG(θ1)[θ1 − θ2]‖G
≤ sup(s,y)∈Ω
|g1(s, y)− g2(s, y)| × sup(s,y)∈Ω
| ∂∂y
(g1(s, y)− g2(s, y))| ≤ ‖g1 − g2‖21,∞, (S.175)
which verifies Assumption 3.11(i) holds with Kg = 2. Similarly, we additionally conclude
‖∇ΥG(θ1)−∇ΥG(θ2)‖o
= supgh:‖gh‖1,∞≤1
sup(s,y)∈Ω
|(g1(s, y)− g2(s, y))∂
∂ygh(s, y) + gh(s, y)
∂
∂y(g1(s, y)− g2(s, y))|
≤ 2‖g1 − g2‖1,∞, (S.176)
which verifies Assumption 3.11(ii) holds with Kg = 2 as well. Moreover, note that since
any θ = (g, γ) ∈ Θ satisfies ‖g‖1,∞ ≤ C0, it follows that ‖g‖1,∞ ≤ C0 + ε for any g ∈ Θε.
Thus, by identical arguments to those in (S.176) we obtain
‖∇ΥG(θ)‖o ≤ 2‖g‖1,∞ ≤ 2(C0 + ε), (S.177)
which thus verifies Assumption 3.11(iii) holds with Mg = 2(C0 + ε).
Next note ΥF : B→ F is affine and continuous, and hence ∇ΥF (θ)[h] = ΥF (h)− c0
for all θ, h ∈ B. Therefore, Assumptions 3.12(i)(ii) hold with Kf = 0, while
supgh:‖gh‖1,∞≤1
|gh(s0, y0)| ≤ 1 (S.178)
implies Assumption 3.12(iii) is satisfied with Mf = 1. Since ΥF being affine and Kf = 0
further imply that Assumptions 3.12(iv) and 3.13 hold, the lemma follows.
Lemma S.6.12. Let Assumptions S.6.7, S.6.8, S.6.9, S.6.10 hold, and an = (log(n))−1/2.
54
For any `n with knj2n log(1 + kn)/sn
√n = o(`n), it follows uniformly in P ∈ P0 that
Un(R|+∞) = infh√n∈Vn(θn,`n)
‖Wnρ(·, θn) + Dn[h]‖Σn,2 + oP (an)
Un(Θ|+∞) = inf‖ h√
n‖B≤`n
‖Wnρ(·, θun) + Dn[h]‖Σn,2 + oP (an).
Proof: We establish the lemma by applying Lemma 3.1. To this end, recall that in the
proof of Theorem S.6.3, Assumptions 3.3(i)(iii) and 3.4 were verified to hold with Bn √kn and Jn
√jn log(1 + jn). Since the eigenvalues of EP [pjnn (S, Y )pjnn (S, Y )′] are
bounded uniformly in P ∈ P by Assumption S.6.8(iii) and ‖θ‖E = supP∈P ‖g‖P,2 +‖γ‖2for any θ = (g, γ), it also follows that for any h = (pjn′n βh, γh) ∈ Bn we have
‖h‖E = supP∈P‖pjn′n βh‖P,2 + ‖γ‖2 . ‖βh‖2 + ‖γh‖2
.1
sn‖EP [qknn (Z)(pjnn (S, Y )′βh +W ′γh)]‖2 =
1
sn‖Dn,P [h]‖2,
where the second inequality holds by Assumption S.6.9(iii) and the final equality follows
from the definition of Dn,P [h]. Hence, since νn 1/sn by Lemma S.6.7 and p = 2, we
conclude the Lemma 3.1 requirement that ‖h‖E ≤ νn‖Dn,P (θ)[h]‖p for all θ ∈ An(P )
holds with An(P ) = Θn ∩ R (for either R = Θ or R corresponding to (S.152) and
(S.153)). Next, define the kn × (jn + dw) matrix Mi,n to be given by
Mi,n ≡1
nqknn (Zi)(p
jnn (Si, Yi)
′ W ′i )− EP [qknn (Z)(pjnn (S, Y )′ W ′)], (S.179)
which satisfies EP [Mi,n] = 0. Since ‖(pjn′n β, γ)‖E ‖β‖2+‖γ‖2 by Assumption S.6.8(iii),
we then conclude from (S.179) that for some C1 <∞ we must have
supP∈P
P ( suph∈Bn
‖Dn[h]− Dn,P [h]‖2‖h‖E
> sn) ≤ supP∈P
P (‖ 1
n
n∑i=1
Mi,n‖o > C1sn)
≤ (jn + kn) exp− ns2nC2
(k2n ∨ jn) + snkn
√jn = o(1), (S.180)
where the final inequality follows by applying Lemma S.6.29 with b1n =√jn (by As-
sumptions S.6.7(iii) and S.6.8(i)) and b2n = kn (by Assumption S.6.9(i)), while the final
equality results from log(kn)k2n/s
2nn = o(1) by Assumption S.6.9(iv) and kn ≥ jn by
Assumption S.6.8(iii). Hence, νn 1/sn and (S.180) imply condition (24) in Lemma
3.1 holds. Finally, we note that by Assumption S.6.11(iv), we may apply Lemma S.6.10
with p = 2 to conclude that Assumption 3.14 holds with R = Θ (and hence for R as cor-
responding to (S.152) and (S.153)) with an = (log(n))−12 . This concludes verifying the
requirements of Lemma 3.1 and therefore the present Lemma follows for any `n satisfying
Sn(B,E)Rn = o(`n), which in this application is equivalent to knj2n log(1 + kn)/sn
√n =
55
o(`n) due to Sn(B,E) . j3/2n and Rn kn
√jn log(1 + kn)/sn
√n.
Lemma S.6.13. Let D1 ∈ Rd1, D2 ∈ Rd2 be distributed according to Q, and for any
matrix A let eigA denote its largest eigenvalue. Then it follows that
i=1 ‖ai‖22 ≤ 1 for all aid1i=1 ∈ A, we additionally have the inequality
supai
d1i=1∈A
d1∑i=1
EQ[(a′iD2)2] ≤ supai
d1i=1∈A
d1∑i=1
eigEQ[D2D′2]‖ai‖22 = eigEQ[D2D
′2],
and therefore the claim of the Lemma follows.
Lemma S.6.14. Let λ be the Lebesgue measure, B(1)b,n
j1nb=1 and B(2)
b,nj2nb=1 be B-splines
on [0, 1] of order r ≥ 3 with no interior knot multiplicity, mesh ratio bounded in n, and
‖ · ‖λ,2 normalized to have norm one. If pj,njnj=1 is the tensor product of B(1)b,n
j1nb=1 and
B(2)b,n
j2nb=1 and Cn ≡ β ∈ Rjn : ‖pjn′n β‖1,∞ ≤ C0, then it follows that∫ ∞
0
√log(N(ε, Cn, ‖ · ‖2))dε .
√j1n ∧ j2n log(jn + 1).
Proof: We rely heavily on Chapter 5 in DeVore and Lorentz (1993), and note that
Bj corresponds to Nj/‖Nj‖λ,2 in their notation. Throughout, for two sequences an and
bn we employ an bn to mean that there exist constants c and c such that can ≤bn ≤ can for all n. In what follows it will also prove convenient to index the elements
of β ∈ Rjn by βb1,b2 with 1 ≤ b1 ≤ j1n and 1 ≤ b2 ≤ j2n. Then note that the mesh
ratios corresponding to B(1)b,n
j1nb=1 and B(2)
b,nj2nb=1 being bounded uniformly in n and two
56
applications of Theorem 5.4.2 in DeVore and Lorentz (1993) imply that
‖pjn′n β‖∞ = supu1∈[0,1]
supu2∈[0,1]
|j2n∑b2=1
B(2)b2,n
(u2)
j1n∑b1=1
βb1,b2B(1)b1,n
(u1)|
supu1∈[0,1]
max1≤b2≤j2n
√j2n|
j1n∑b1=1
βb1,b2B(1)b1,n
(u1)| √j1nj2n‖β‖∞ (S.181)
uniformly in β ∈ Rjn . By similar arguments we also obtain uniformly in β ∈ Rjn that
supu1∈(0,1)
supu2∈[0,1]
|j2n∑b2=1
B(2)b2,n
(u2)
j1n∑b1=1
∂
∂u1βb1,b2B
(1)b1,n
(u1)|
supu1∈(0,1)
max1≤b2≤j2n
√j2n|
j1n∑b1=1
∂
∂u1βb1,b2B
(1)b1,n
(u1)|
max1≤b2≤j2n
max2≤b1≤j1n
√j2nj
3/21n |βb1,b2 − βb1−1,b2 |, (S.182)
where the second result follows by employing equation (3.11) and Theorem 5.4.2 in
Chapter 5 of DeVore and Lorentz (1993) and the mesh ratio of B(1)b,n
j1nb=1 being bounded.
Since by identical arguments we can also derive the symmetric (to (S.182)) relationship
supu1∈[0,1]
supu2∈(0,1)
|j1n∑b1=1
B(1)b1,n
(u1)
j2n∑b2=1
∂
∂u2βb1,b2B
(2)b2,n
(u2)|
max1≤b1≤j1n
max2≤b2≤j2n
√j1nj
3/22n |βb1,b2 − βb1,b2−1|, (S.183)
it follows from results (S.181), (S.182), and (S.183) that there is an M0 <∞ such that
max1≤b1≤j1n
max1≤b2≤j2n
|βb1,b2 | ≤M0/√jn
max1≤b2≤j2n
max2≤b1≤j1n
|βb1,b2 − βb1−1,b2 | ≤M0/(j1n√jn)
max1≤b1≤j1n
max2≤b2≤j2n
|βb1,b2 − βb1,b2−1| ≤M0/(j2n√jn) (S.184)
for all β ∈ Cn. Hence, in order to establish the claim of the lemma, it suffices to bound
the covering numbers for the set defined by (S.184).
We proceed by combining two bounds, one for “small” ε and one for “large” ε. First,
assume without loss of generality j1n ≥ j2n, let cn ≡ dlog(j1n + 1)e, and define the sets
ε
3√jnk1 ≤ βb1,b2 ≤
ε
3√jn
(k1 + 1) for all b1 = mcn + 1 with 0 ≤ m ≤ dj1n/cne − 1
ε
3cn√jnk2 ≤ βb1,b2 − βb1−1,b2 ≤
ε
3cn√jn
(k2 + 1) otherwise (S.185)
57
where k1, k2 are non-zero integers – i.e. the sets (in Rjn) defined in (S.185) consist of
“chains” along the b1 dimension that reset every cn integers. To compute the diameter
of the sets in (S.185), then note that since all “chains” have the same structure
sup ‖β − β‖22 s.t. β, β satisfying (S.185)
≤ sup j2ndj1ncne
cn∑b1=1
(βb1,j2n − βb1,j2n)2 s.t. β, β satisfying (S.185)
≤ j2ndj1ncne
cn∑b1=1
ε2
9jn1 +
2(b1 − 1)
cn2, (S.186)
where the final inequality follows from (S.185). Since dj1n/cnecn ≤ (j1n + cn) ≤ 2j1n
due to dj1n/cne ≤ 1 + j1n/cn and cn ≤ j1n, it follows from jn = j1nj2n that every set in
(S.185) is contained in a ball of radius ε. Moreover, by (S.184) the total number of sets
with the structure in (S.185) needed to cover the set Cn is bounded by
(d6M0
εe)j2nd
j1ncne(d6M0cn
εj1ne)j2nd
j1ncnecn . (S.187)
Next, we employ again the bound dj1n/cnecn ≤ 2j1n and dae ≤ 2a whenever a ≥ 1, to
obtain from (S.186) and (S.187) that whenever ε ≤ 6M0cn/j1n we have
N(ε, Cn, ‖ · ‖2) ≤ (12M0
ε)2jncn (
12M0cnεj1n
)2jn
= (12M0cnεj1n
(j1ncn
)1cn )
2jn(cn+1)cn ≤ (
M1 log(1 + j1n)
εj1n)4jn , (S.188)
where the final equality holds for someM1 <∞ due to (cn+1)/cn ≤ 2 and (j1n/cn)1/cn ≤
j1
log(1+j1n)
1n = O(1) because cn = dlog(1 + j1n)e.
The bound in (S.188) is valid only for ε ≤ 6M0cn/j1n. To obtain a bound for
ε ≥ 6M0cn/j1n, let Zb1,b2b1,b2 be independent standard normal random variables. By
Sudakov’s inquality (see, e.g., Proposition A.2.5 in van der Vaart and Wellner (1996)),
it then follows that for some M2 <∞ independent of n we have that
√log(N(ε, Cn, ‖ · ‖2)) ≤ M2
εE[ sup
β∈Cn
j1n∑b1=1
j2n∑b2=1
βb1,b2Zb1,b2 ]. (S.189)
Next, for notational convenience define ∆b1βb1,b2 = (βb1,b2 − βb1−1,b2) and ∆b2βb1,b2 =
58
(βb1,b2 − βb1,b2−1), and then note that by (S.184) it follows that
supβ∈Cn
j1n∑b1=1
j2n∑b2=1
βb1,b2Zb1,b2
= supβ∈Cn
j2n∑b2=1
j1n∑b1=2
∆b1βb1,b2
j1n∑b1=b1
Zb1,b2 +
j2n∑b2=2
∆b2β1,b2
j1n∑b1=1
j2n∑b2=b2
Zb1,b2 + β1,1
j1n∑b1=1
j2n∑b2=1
Zb1,b2
≤j2n∑b2=1
j1n∑b1=2
M0
j1n√jn|j1n∑b1=b1
Zb1,b2 |+j2n∑b2=2
M0
j2n√jn|j1n∑b1=1
j2n∑b2=b2
Zb1,b2 |+M0√jn|j1n∑b1=1
j2n∑b2=1
Zb1,b2 |
Hence, employing that if W ∼ N(0, σ2) then E[|W|] . σ, we can conclude that
E[ supβ∈Cn
j1n∑b1=1
j2n∑b2=1
βb1,b2Zb1,b2 ] .j2n∑b2=1
j1n∑b1=2
√j1n − b1j1n√jn
+
j2n∑b2=2
√j1n(j2n − b2)
j2n√jn
+ 1
≤ jn√j1n
j1n√jn
+j2n√j1nj2n
j2n√jn
+ 1 ≤ 3√j2n
where in the final inequality we employed that jn = j1nj2n. Hence, by (S.189) we have
√log(N(ε, Cn, ‖ · ‖2)) .
√j2nε
. (S.190)
To conclude the proof, we combine the bounds in (S.188) and (S.190). In particular,
setting δn ≡ 6M0dlog(j1n + 1)e/j1n and observing that ‖β‖2 ‖pjn′n β‖λ,2 ≤ C0 for all
β ∈ Cn allows us to conclude that for some M2 <∞ we must have
∫ ∞0
√log(N(ε, Cn, ‖ · ‖2))dε .
∫ M2
δn
√j2nε
+√jn
∫ δn
0(log(
M1 log(j1n)
εj1n))1/2dε
≤√j2n log(1 + j1n) +
√jn log(1 + j1n)
j1n
∫ 1
0(log(
1
u))1/2du .
√j2n log(1 + j1n)
where the second inequality follows from the change of variables u = ε/δn and the final
inequality employed that jn = j1nj2n and j2n ≤ j1n. Substituting j2n = j1n ∧ j2n and
employing j1n ≤ jn establishes the Lemma.
S.6.3 Quantile Treatment Effects
For our next example we formally study the nonparametric quantile treatment effect
(QTE) application introduced in Section 2.2. Recall that in this context θ0(P ) is as-
sumed to be the solution to the conditional moment restriction
P (Y ≤ θ0(P )(D)|Z) = τ (S.191)
59
where Y ∈ R, D ∈ [0, 1], and Z ∈ R. Thus, in this application X = (Y,D,Z) ∈ X ≡R× [0, 1]×R and the residual function ρ : X×B→ R corresponding to (S.191) is
ρ(X, θ) = 1Y ≤ θ(D) − τ. (S.192)
Our analysis readily enables us to conduct inference on the QTE itself. However, in
order to illustrate our conditions in a number of different settings, we focus here on a
nonlinear functional of θ0(P ). In particular, we conduct inference on∫ 1
0(∇θ0(P )(u))2du− (
∫ 1
0∇θ0(P )(u)du)2
while imposing that the QTE be increasing in treatment intensity (i.e. d 7→ ∇θ0(P )(d)
is increasing). To map this problem into our framework we define
0 for every ε > 0; (ii) There are ε and sn > 0 satisfying for all P ∈ P0 and ‖θ −Πnθ0(P )‖1,∞ ≤ ε, sn ≤ singEP [fY |D,Z(θ(D)|D,Z)qknn (Z)pjnn (D)′] and sn = O(1).
Assumption S.6.17(i) simply demands that the quadratic functions belong to Bn – a
condition we employ to verify Assumption 3.13. In turn Assumption S.6.17(ii) implies
that θ0(P ) and its approximation Πnθ0(P ) belong to the interior of Θ. Assumption
S.6.17(iii) imposes a sufficient condition for verifying the bootstrap coupling requirement
of Assumption 3.14. In particular, we establish Assumption 3.14 holds by applying the
results in Appendix S.9 to a Haar basis expansion. While condition S.6.17(iii) suffices
for verifying Assumption 3.14 in both the endogenous (Z 6= D) and exogenous (Z = D)
settings, we note in both cases the rate condition of Assumption S.6.17(iii) could be
improved.1 Finally, Assumption S.6.17(iv), which is satisfied for instance by B-Splines,
ensures that Sn(B,E) j5/2n , while Assumption S.6.17(v) imposes the rate requirements
on `n, `un and rn. Intuitively, these rate requirements demand that `n ↓ 0 sufficiently fast
and rn not tend to zero too quickly.
The next theorem establishes the validity of the bootstrap procedure.
Theorem S.6.6. Let Assumptions S.6.12, S.6.13, S.6.14, S.6.15, S.6.16, and S.6.17
hold and an = (log(n))−1/2. Then, it follows that there are sequences ˜n and ˜u
n such
1For instance under endogeneity, a better rate could be obtained by conducting a basis expansionusing the tensor product of a Haar Basis for (Y,D) and the functions qk,nknk=1.
64
that `n ˜n, `un ˜u
n and uniformly in P ∈ P0 we have
Un(R|`n) ≥ U?n,P (R|˜n) + oP (an)
Un(R|`n)− Un(Θ|`un) ≥ U?n,P (R|˜n)− U?n,P (Θ|˜un) + oP (an).
Thus, Theorems S.6.5 and S.6.6 imply that as critical value for In(R) we may employ
Hence, by the law of iterated expectations, a second application of the mean value
theorem, and employing that ‖θ‖1,∞ ≤ C0 by definition of Θ, we can conclude
$2(f, h, P ) ≤ sup‖(sy ,sd)‖2≤h
EP [(1Y +sy ≤ θ(D+sd)−1Y ≤ θ(D))21D+sd ∈ [0, 1]]
. sup‖(sy ,sd)‖2≤h
EP [|θ(D + sd)− sy − θ(D)|1D + sd ∈ [0, 1]] . h. (S.215)
The claim of the Lemma then follows from (S.212), (S.213), and (S.215).
Lemma S.6.18. Define the class Fn ≡ f : f(v) = 1y ≤ θ(d) − τ for some θ ∈ Θnfor Θn as in (S.196), and suppose that Assumption S.6.12(iii) and S.6.13(ii) hold. For
ζjn ≡ supd∈[0,1] ‖pjnn (d)‖2, it then follows that for all ε ≤ 1 and some K <∞
supP∈P
N[ ](ε,Fn, ‖ · ‖P,2) ≤ expKε ∧ (
K√ζjnε
)2jn ,
and supP∈P J[ ](ε,Fn, ‖ · ‖P,2) .√
1 ∧ ε ∧√jn(log(ζjn) + log(1 ∨ ε−1))(1 ∧ ε).
Proof: We first note that if θL(d) ≤ θ(d) ≤ θU (d), then it immediately follows that
where in the final inequality we employed Jensen’s inequality and that θ1(d) ∨ θ2(d) −θ1(d) ∧ θ2(d) = |θ1(d) − θ2(d)|. It thus follows Assumption 3.8 holds with ‖ · ‖E =
72
supP∈P0‖·‖P,2 and κρ = 1/2. Moreover, Jensen’s inequality and the mean value theorem
imply for some θ such that θ(d) is a convex combination of θ1(d) and θ0(d) that
EP [(P (Y ≤ θ1(D)|Z)− P (Y ≤ θ2(D)|Z)−∇mP (θ0)[θ1 − θ2](Z))2]
≤ EP [(fY |DZ,P (θ(D)|D,Z)− fY |DZ,P (θ0(D)|D,Z)θ1(D)− θ0(D))2]
. ‖θ1 − θ0‖2∞ × supP∈P
EP [(θ1(D)− θ0(D))2],
where the final inequality follows from fY |DZ,P being Lipschitz uniformly in (D,Z) and
P ∈ P. Hence, we may conclude Assumption 3.9(i) is satisfied with ‖ · ‖L = ‖ · ‖∞ and
‖ · ‖E = supP∈P0‖ · ‖P,2. Furthermore, once again employing Jensen’s inequality and
that fY |DZ,P is Lipschitz uniformly in (D,Z) and P ∈ P yields
EP [(EP [fY |DZ,P (θ1(D)|D,Z)− fY |DZ,P (θ0(D)|D,Z)h(D)|Z])2]
. ‖θ1 − θ0‖2∞ × supP∈P‖h‖2P,2 (S.225)
which implies Assumption 3.9(ii) is also satisfied under the stated choices of ‖ · ‖L and
‖ · ‖E. Finally, we note Assumption 3.9(iii) is immediate due to Jensen’s inequality and
fY |DZ,P being bounded uniformly in (D,Z) and P ∈ P.
Lemma S.6.21. If Assumption S.6.17(i) holds, B = C2B([0, 1]) and ΥG, ΥF , and Θ are
as defined in (S.193) (with λ 6= 0), (S.194), and (S.195), then it follows that Assumptions
3.11, 3.12, and 3.13 are satisfied with Kg = 0, ∇ΥG(θ)[h] = −∇2h, and
∇ΥF (θ)[h] = 2
∫ 1
0θ(u)h(u)du− 2(
∫ 1
0θ(u)du)(
∫ 1
0h(u)du). (S.226)
Proof: Note that since ΥG is linear and continuous, it immediately follows that As-
sumptions 3.11(i) and 3.11(ii) hold with ∇ΥG = ΥG and Kg = 0. It further follows
from ∇ΥG = ΥG and the definitions of the operator norm ‖ · ‖o and ‖ · ‖m,∞ that
‖∇ΥG(θ)‖o = sup‖h‖2,∞=1
‖ − ∇2h‖∞ ≤ 1, (S.227)
which implies Assumption 3.11(iii) holds with Mg = 1. Moreover, by direct calculation
|ΥF (θ1)−ΥF (θ2)−∇ΥF (θ1)[θ1 − θ2]|
= |∫ 1
0(θ1(u)− θ2(u))2du− (
∫ 1
0(θ1(u)− θ2(u))du)2| ≤ ‖θ1 − θ2‖22,∞, (S.228)
which implies ΥF is indeed Frechet differentiable and its derivative is equal to ∇ΥF as
73
defined in (S.226). In addition, by (S.226) and Jensen’s inequality we have
‖∇ΥF (θ1)−∇ΥF (θ2)‖o
= sup‖h‖2,∞=1
2|∫ 1
0(θ1(u)− θ2(u))(h(u)−
∫ 1
0h(u)du)du| ≤ 2‖θ1 − θ2‖2,∞, (S.229)
which together with (S.228) implies Assumptions 3.12(i) and 3.12(ii) hold with Kf = 2.
Next, note that since λ 6= 0 it follows that Fn = R. For any θ ∈ Bn such that ΥF (θ) 6= 0,
we then define ∇ΥF (θ)− : Fn → Bn to be given (for any c ∈ R) by
∇ΥF (θ)−[c](d) ≡ c×θ(d)−
∫ 10 θ(u)du
2ΥF (θ), (S.230)
and note that since θ ∈ Bn and the constant function is in Bn by Assumption S.6.17(i),
it follows that ∇ΥF (θ)−[c] ∈ Bn. Moreover, by direct calculation we obtain
∇ΥF (θ)∇ΥF (θ)−[c] = 2
∫ 1
0θ(u)c×
θ(u)−∫ 1
0 θ(u)du
2ΥF (θ)du = c× 2ΥF (θ)
2ΥF (θ)= c, (S.231)
which verifies ∇ΥF (θ)− is indeed the right inverse of ∇ΥF (θ). In addition note that
‖∇ΥF (θ)−‖o = sup|c|=1‖c×
θ −∫ 1
0 θ(u)du
2ΥF (θ)‖2,∞ ≤
‖θ‖2,∞ΥF (θ)
, (S.232)
and hence, since ‖θ‖2,∞ ≤ C0 and ΥF (θ) = λ for any θ ∈ Θr0n(P ), it follows that we
may select an ε > 0 such that Assumption 3.12(iv) holds with Mf = 4C0/λ.
Next, let θ be the function d 7→ d2 and note that by Assumption S.6.17(i) it follows
that θ ∈ Bn. For any θ0 ∈ Θr0n(P ) we may then set h0 to equal
h0 ≡2λ
C0θ − ∇ΥF (θ0)[θ]
C0θ0, (S.233)
which belongs to Bn since θ, θ0 ∈ Bn. Further observe ∇ΥF (θ0)[θ0] = 2ΥF (θ0) = 2λ
due to θ0 ∈ R, and hence by linearity of ∇ΥF (θ0) and (S.233) we can conclude that
h0 ∈ Bn ∩N (∇ΥF (θ0)). In addition, it also follows from ΥG = ∇ΥG that
uniformly in P ∈ P0. To this end, we rely on Theorem S.5.1(ii) (for En(R|`n)) and
Lemma S.5.5 (for En(Θ|`un)) Also note that in the proof of Theorem 4.1 we showed
Assumptions 4.1, 4.2, and 4.3 imply Assumptions 3.1-3.10 hold with Bn √kn, Jn √
jn log(1 + jn), νn √jn, Rn knjn
√log(1 + kn)(1 + jn)/n, an = (log(n))−1/2, κρ =
1, ‖θ‖L = ‖θ‖E = max1≤≤J ‖F (c|·)‖∞, and ‖θ‖B =∑J
=1 ‖F (c|·)‖∞ + ‖ν‖TV for
R = Θ and R corresponding to the constraints in (32) and (33)-(35).
In order to apply Theorem S.5.1(ii), we note Assumption 4.4(i) and Lemma S.6.27
verify Assumptions 3.11, 3.12, and 3.13 are satisfied with Kg = 0, Mg = 1, and Kf = 0,
while Assumption 4.4(iii) and Lemma S.6.30 verify Assumption 3.14 and Assumption
3.15(i) is immediate given the definitions of ‖ · ‖E and ‖ · ‖B. Also note Assumption
3.15(ii) follows from Theorem 3.1, νn √jn, and τn
√jn = o(1) by Assumption 4.4(v)
implying that−→d H(Θr
n,Θr0n(P ), ‖ · ‖E) = oP (1) uniformly in P ∈ P0 and therefore
lim infn→∞
infP∈P
P (θ ∈ Bn :−→d H(θ, Θr
n, ‖ · ‖E) ≤ ε ⊆ Θn)
≥ lim infn→∞
infP∈P
P (θ ∈ Bn :−→d H(θ,Θr
0n(P ), ‖ · ‖E) ≤ 2ε ⊆ Θn) = 1, (S.249)
where the final equality holds for any ε < 1/2 by Assumption 4.4(iv), the definition of
79
Θn, and ‖FP (c|·)‖∞ ≤ 1. Next, observe Lemma 4.1 and the definitions of ‖ · ‖E, ‖ · ‖L,
and ‖ · ‖B imply Assumption S.5.1 holds with Dn(B,E) ζn and Dn(L,E) = 1. Since
Km = Kg = Kf = 0 and ΥF and ΥG are affine, the only requirements imposed by
Assumption S.5.2 are that k1/pn
√log(1 + kn)Bn supP∈P J[ ](`
κρn ∨ (νnτn)κρ ,Fn, ‖ · ‖P,2) =
o(an) and (Rn + νnτn)Dn(B,E) = o(rn), which are implied by Assumption 4.4(v),
Lemma S.6.24, and kn√jn log2(n)`n = o(1) by hypothesis. Hence, all the conditions of
Theorem S.5.1(ii) hold, which implies there is a ˜n `n such that uniformly in P ∈ P0
En(R|`n) ≥ U?n,P (R|˜n) + oP (an). (S.250)
Finally, to apply Lemma S.5.5, note that in coupling En(Θ|`un) we can set ‖ · ‖B =
‖ · ‖E = max1≤≤J ‖F (c|·)‖∞ and ΥG(θ) = ΥF (θ) = 0 for all θ ∈ B (since R = Θ).
Hence, Assumptions 3.11, 3.12, and 3.13, 3.15(i) are immediate, while Assumption
3.14 is satisfied by Assumption 4.4(iii) and Lemma S.6.30. Further note since Θ0(P )
is an equivalence class under ‖ · ‖E, when studying the unconstrained statistic we
can treat the model as identified. As a result, we may set τun = 0 and Assump-
tion 3.15(ii) holds by arguments identical to (S.249). In order to apply Lemma S.5.5,
it therefore only remains to verify Assumption 3.16 for the unconstrained problem.
However, in this instance, the only condition imposed by Assumption 3.16 is that
k1/pn
√log(1 + kn)Bn supP∈P J[ ](`
κρn ,Fn, ‖ · ‖P,2) = o(an), which holds by Lemma S.6.24
and kn√jn log2(n)`un = o(1) by hypothesis. Thus, (S.250) and Lemma S.5.5 verify
(S.248), which in turn establishes the theorem.
Lemma S.6.23. If Assumptions 4.1, 4.2(i)(iii), 4.3 hold, then Assumptions 3.5, 3.6
hold with R = Θ and R corresponding to (32), (33)-(35), ‖θ‖E = max1≤≤J ‖F (c|·)‖∞for any θ = (F (c|·)J=1, ν) ∈ B, Vn(P ) = Θn ∩R, and ν−1
n 1/√jn.
Proof: First note that Assumption 4.1(iv) and ‖Σ,n(P )‖o,2 being uniformly bounded
in P ∈ P and n by Assumption 4.3(ii) allow us to conclude that
supP∈P0
supθ∈Θ0(P )
Qn,P (Πrnθ) = O((n log(n))−
12 ). (S.251)
Furthermore, since the class Fn has envelope 3 due to ‖F (c|·)‖∞ ≤ 2 by definition
of Θ in (30), Lemma S.6.24 allows us to set Jn √jn log(1 + jn). Since in addition
Assumption 4.2(i) implies Bn √kn while Assumption 4.2(iii) implies kn ≥ jn, we
obtain that ηn (as defined in Assumption 3.6) satisfies ηn √jnkn log(1 + kn)/
√n.
Hence, Qn,P (θ) ≥ 0 for all θ ∈ B and (S.251) imply Assumption 3.5 holds.
In order to verify Assumption 3.6(i), let ‖θ‖E = max1≤≤J ‖F (c|·)‖∞ for any θ =
(F (c|·)J=1, ν) ∈ B. Then note any (F (c|·)J=1, ν) = θ ∈ Θn must be such that
F (c|·) = pjn′n β,θ for some β,θ ∈ Rjn and, similarly, Πrnθ0(P ) = (Fn(c|·)J=1, νn) must
80
satisfy Fn(c|·) = pjn′n β,n. The Cauchy Schwarz inequality, and Assumptions 4.1(ii)(iii)
and 4.2(iii) then yield uniformly in P ∈ P that
‖θ−Πrnθ0(P )‖E .
√jn
J∑=1
‖β,θ−β,n‖2 .√jn
J∑=1
‖EP [qknn (W )pjnn (W )′(β,θ−β,n)]‖2
.√jn
J∑=1
‖EP [(F (c|W )− Fn(c|W ))qknn (W )]‖Σ,n(P ),21/2, (S.252)
where the final inequality holds due to ‖Σ,n(P )−1‖o,2 being uniformly bounded by
Assumption 4.3(ii) and∑J
=1 |a()| ≤√J ‖a‖2 for any (a(1), . . . , a(J )) = a ∈ RJ . Hence,
combining (S.251), (S.252), and the law of iterated expectations yields
1√jn‖θ −Πr
nθ0(P )‖E . Qn,P (θ)− supθ∈Θ0(P )
Qn,P (Πrnθ) +O((n log(n))−1/2). (S.253)
Since inequality (S.253) holds for any θ ∈ Θn and, as already shown, ηn √jnkn log(1+
kn)/√n, we conclude that Assumption 3.6(i) is satisfied with ν−1
n = 1/√jn and Vn(P ) =
Θn ∩ R. Finally, we note Assumption 3.6(ii) is implied by Assumption 4.3(i), result
(S.251), and ηn √jnkn log(1 + kn)/
√n.
Lemma S.6.24. Define the class Fn ≡ f : f(v) = (1y ≤ c−pjnn (w)′β) for some 1 ≤ ≤ J and ‖pjn′n β‖∞ ≤ 2 and suppose that Assumptions 4.1(ii)(iii) hold. Then, it
Moreover, since the eigenvalues of EP [qknn (Z)qknn (Z)′] are bounded uniformly in P ∈ P
by Assumption and supx ‖rjnn (x)‖2 . b1n it additionally follows that
supP∈P‖
n∑i=1
EP [Mi,nM′i,n]‖o ≤ supP∈P
2
n‖EP [qknn (Z)qknn (Z)′‖rjnn (X)‖22]‖o .
b22nn. (S.260)
Identical arguments but relying on the eigenvalues of EP [rjnn (X)rjnn (X)′] being bounded
uniformly in P ∈ P and supx ‖qknn (x)‖2 . b1n by hypothesis further yield that
supP∈P‖
n∑i=1
EP [M′i,nMi,n]‖o .b21nn. (S.261)
The claim of the Lemma the follows from results (S.259), (S.260), and (S.261) allowing
us to apply Theorem 1.6 in Tropp (2012) with σ2 (b21n ∨ b22n)/n and R b1nb2n/n.
Lemma S.6.30. If Assumptions 4.1(i)-(iii), 4.2(i)(ii) hold, and j2nk
2n log(1 + jnkn) =
o(n), then it follows that Assumption 3.14 holds with R = Θ for any sequence an satis-
fying k1/pn (k2
nj5n log3(1 + knjn)/n)1/4 = o(an).
Proof: Let Gn ≡ g : g(x) = 1y ≤ c − pjn′n β for some 1 ≤ ≤ J and ‖pjn′n β‖∞ ≤ 2and Fn ≡ gqk,n : g ∈ Gn and 1 ≤ k ≤ kn. Then note that when R = Θ we obtain
supf∈Fn
‖Wnfqknn −W?
nfqknn ‖p ≤ k1/p
n supf∈Fn
|Wnf −W?n,P f |. (S.262)
We will therefore establish the lemma by employing (S.262) and applying Theorem
S.9.1(i) to the class Fn. To this end, let fdnn (v) ≡ qknn (z)⊗(pjnn (w)′, 1y ≤ c1, . . . , 1y ≤cJ )′ and note dn = kn(jn + J ). Next observe that applying Lemma S.6.13 with
D1 ≡ (pjnn (W )′, 1Y ≤ c1, . . . , 1Y ≤ cJ )′ and D2 = qknn (Z) allows us to conclude
supP∈P
EP [fdnn (V )fdnn (V )′] ≤ supP∈P‖eigD1D
′1‖P,∞ × eigEP [D2D
′2] . jn,
where the final inequality holds by Assumptions 4.1(ii) and 4.2(ii). Hence, it follows
Assumption S.9.1(i) holds with Cn = jn, while Assumption S.9.1(ii) is satisfied with
Kn =√knjn by Assumptions 4.1(ii) and 4.2(i). Further note that by Assumptions
4.1(iii) it follows that ‖β‖2 supP∈P ‖pjn′n β‖P,2 ≤ ‖pjn′n β‖∞. Hence, by definition of
Fn, there is a C0 <∞ such that any f ∈ F satisfies fdn′n β for some β in
Bn ≡ β ∈ Rdn : β = ek ⊗ γ for some γ ∈ Rjn+1 with ‖γ‖2 ≤ C0,
where ek ∈ Rkn has its kth coordinate equal to one and all other coordinates equal to
zero. In particular, it follows that Assumption S.9.2(i) is immediate with Gn,P equal to
the zero function and J1n = 0. Moreover, setting Cn ≡ γ ∈ R1+jn : ‖γ‖2 ≤ C0, we
85
can then conclude from the definition of Bn and N(ε, Cn, ‖ · ‖2) . 1 ∨ (C0/ε)jn that∫ ∞
0
√log(N(ε,Bn, ‖ · ‖2))dε
.∫ C0
0
√log(kn) + log(N(ε, Cn, ‖ · ‖2))dε .
√log(kn) +
√jn,
which verifies Assumption S.9.2(ii) is satisfied with J2n √
log(kn) +√jn. Thus,
applying Theorem S.9.1(i) with Kn √kn, jn, Cn jn, dn . knjn, J1n = 0, and
J2n √
log(kn) +√jn implies that uniformly in P ∈ P we have
supf∈Fn
|Wnf −W?n,P f | = OP (k
2nj
5n log3(1 + knjn)
n1/4) (S.263)
provided that j2nk
2n log(1 + jnkn) = o(n). Since the latter condition is satisfied by hy-
pothesis, the claim of the lemma then follows from (S.262) and (S.263).
Lemma S.6.31. Define ‖θ‖E = max1≤≤J ‖F (c|·)‖∞ and let Vn(θ, `n) = Vn(θ,+∞) ∩h/√n : ‖h/
√n‖E ≤ `n for Vn(θ,+∞) as in (39). If Assumptions 4.1, 4.2(i)(iii)(iv),
and 4.3 hold, then for any an and `n satisfying k4nj
5n log3(1 + knjn)/n = o(a4
n) and
knjn log(n)/√n = o(`n) it follows uniformly in P ∈ P that
Un(R|+∞) = infθ∈Θr
n
infh√n∈Vn(θ,`n)
J∑=1
‖Wnρ(·, θ)qknn + D,n[h]‖Σ,n,21/2 + oP (an)
Un(Θ|+∞) = inf‖ h√
n‖E≤`n
J∑=1
‖Wnρ(·, θun)qknn + D,n[h]‖Σ,n,2
1/2 + oP (an).
Proof: We establish the claim by verifying the conditions of Lemma 3.1 under both R =
Θ and R corresponding to the constraints in (32) and (33)-(35). To this end, recall that
in the proof of Theorem 4.1 we argued (for both specifications of R) that Assumptions
3.3(i)(iii) hold with Bn √kn and Jn
√jn log(1 + jn) and that Assumption 3.4 is
satisfied by Assumption 4.3. Further note Assumption 4.1(ii) and the Cauchy-Schwarz
inequality imply for any (pjn′n β,hJ=1, ν) = h ∈ Bn that
‖h‖E . max1≤≤J
√jn‖β,h‖2 .
√jn
J∑=1
‖D,n,P [h]‖221/2 =√jn‖Dn,P [h]‖2, (S.264)
where the second inequality follows from D,n,P [h] = −EP [qknn (Z)pjnn (W )′β,h] and the
smallest singular values of EP [qknn (Z)pjnn (W )′] being bounded away from zero uniformly
in P ∈ P by Assumption 4.2(iii). Since νn √jn by Lemma S.6.23 and the derivative
Dn,P (θ) does note depend on θ, we conclude ‖h‖E ≤ νn‖Dn,P [h]‖2 for all h ∈ Bn – i.e.,
in the verifying the conditions of Lemma 3.1 we may set An(P ) = Θn ∩ R. In order to
86
verify condition (24) of Lemma 3.1 we note that since ‖h‖E . max1≤≤J√jn‖βh‖2 as
shown in (S.264), the definitions of the operator norm ‖ · ‖o,2, D,n, and D,n,P imply
suph∈Bn
‖Dn[h]− Dn,P [h]‖2‖h‖E
.√jn‖
1
n
n∑i=1
qknn (Zi)pjnn (Wi)
′ − EP [qknn (Z)pjnn (W )′]‖o,2 = oP (1), (S.265)
where the final equality holds uniformly in P ∈ P by applying Lemma S.6.29 with
b1n =√jn, b2n = kn (by Assumptions 4.1(ii) and 4.2(i)) and employing that kn ≥ jn and
k2njn log(kn)/n = o(1) by Assumptions 4.2(iii)(iv). Finally, we note that Assumption
4.4(iii) implies j5nk
4n log5(1 + jnkn) = o(n), and employing Lemma S.6.30 with p = 2
yields that Assumption 3.14 holds for R = Θ, and hence also for R corresponding to
(32) and (33)-(35). The only condition of Lemma 3.1 that remains to be verified is that
Sn(B,E)Rn = o(`n). To this end, we observe that since Vn(θ, `n) is defined through
the constraint ‖h‖E ≤ `n (instead of ‖ · ‖B ≤ `n), it suffices to verify Rn = o(`n) – i.e.
for the purposes of this lemma we may set ‖ · ‖B = ‖ · ‖E. However, since as argued
Jn √jn log(jn), Bn =
√kn, and νn
√jn, we have Rn knjn
√log(kn) log(jn)/
√n,
and the requirement Rn = o(`n) is implied by knjn log(n)/√n = o(`n). Thus, the claim
of the lemma follow from Lemma 3.1.
S.7 Local Parameter Space
This section contains analytical results concerning the local parameter space and our
approximation to it. The main result of this section is Theorem S.7.1, which plays an
instrumental role in the proof of the results of Section 3.3 in the main text.
Theorem S.7.1. Let Assumptions 3.1, 3.11, 3.12, and 3.13 hold, `n, δn, rn∞n=1 satisfy
Assumption S.8.3. The classes of functions Fn satisfy: (i) supP∈P supf∈Fn $(f, h, P ) ≤ϕn(h) for some ϕn : R+ → R+ satisfying ϕn(Ch) ≤ Cκϕn(h) for all n, C > 0, and
some κ > 0; and (ii) supP∈P supf∈Fn ‖f‖P,∞ ≤ Kn for some Kn > 0.
In Assumption S.8.1 we impose that V ∼ P be continuously distributed for all
P ∈ P, with uniformly (in P ) bounded supports and densities bounded from above
and away from zero. Assumption S.8.2 requires that the support of V under each P
be “smooth” in the sense that it may be seen as a differentiable transformation of the
unit square. Together, Assumptions S.8.1 and S.8.2 enable us to construct partitions
of Ω(P ) such that the diameter of each set in the partition is controlled uniformly in
P ; see Lemma S.8.1. As a result, the approximation error by the Haar bases implied
by each partition can be controlled uniformly by the integral modulus of continuity; see
Lemma S.8.2. Together with Assumption S.8.3, which imposes conditions on the integral
modulus of continuity of Fn uniformly in P , we can obtain a uniform coupling result
through the analysis in Koltchinskii (1994). We note that the homogeneity condition
on ϕn in Assumption S.8.3(i) is not necessary, but imposed to simplify the bound.
The next theorem provides us with an important tool for verifying Assumption
3.7. In its statement we employ the same notation as in the main text, where recall
N[ ](ε,Fn, ‖·‖P,2) denotes the smallest number of brackets of size ε (under ‖·‖P,2) needed
97
to cover Fn, while J[ ](ε,Fn, ‖ · ‖P,2) is defined as the integral
J[ ](ε,Fn, ‖ · ‖P,2) ≡∫ ε
0
√1 + logN[ ](u,Fn, ‖ · ‖P,2)du.
Theorem S.8.1. Let Assumptions S.8.1-S.8.3 hold, Vini=1 be i.i.d. with Vi ∼ P ∈ P
and for any δn ↓ 0 let Nn ≡ supP∈PN[ ](δn,Fn, ‖ ·‖P,2), Jn ≡ supP∈P J[ ](δn,Fn, ‖ ·‖P,2),
Sn ≡ (
dlog2 ne∑i=0
2iϕ2n(2−
idv ))
12 . (S.310)
If Nn ↑ ∞, then there are Gaussian processes Wn,P ∞n=1 such that uniformly in P ∈ P
‖Gn,P −Wn,P ‖Fn
= OP (Kn log(nNn)√
n+Kn
√log(nNn) log(n)Sn√
n+ Jn(1 +
JnKn
δ2n
√n
)). (S.311)
Theorem S.8.1 is a mild modification of the results in Koltchinskii (1994). The proof
of Theorem S.8.1 relies on a coupling of the empirical process on a sequence of grids of
cardinality Nn, and employs the equicontinuity of Gn,P and Wn,P to obtain a coupling
on the entire class Fn. The conclusion of Theorem S.8.1 applies to any choice of grid
accuracy δn. In order to obtain the best rate, δn must be chosen to balance the terms
in (S.311) and thus depends on the metric entropy of Fn through the terms Nn and Jn.
Below, we include the proof of Theorem S.8.1 and auxiliary results.
Proof of Theorem S.8.1: Let ∆i(P ) be the partitions of Ω(P ) in Lemma S.8.1
and BP,i the σ-algebra generated by ∆i(P ). By Lemma S.8.2 and Assumption S.8.3,
supP∈P
supf∈Fn
(
dlog2 ne∑i=0
2iEP [(f(V )− EP [f(V )|BP,i])2])12
≤ C1(
dlog2 ne∑i=0
2iϕ2n(2−
idv ))
12 ≡ C1Sn (S.312)
for some constant C1 > 0 and for Sn as defined in (S.310). Next, let FP,n,δn ⊆ Fn denote
a finite δn-net of Fn with respect to ‖ · ‖P,2. Since N(ε,Fn, ‖ · ‖P,2) ≤ N[ ](ε,Fn, ‖ · ‖P,2),
it follows from the definition of Nn that we may choose FP,n,δn so that
supP∈P
card(FP,n,δn) ≤ supP∈P
N[ ](δn,Fn, ‖ · ‖P,2) ≡ Nn. (S.313)
By Theorem 3.5 in Koltchinskii (1994), (S.312) and (S.313), it follows that for each
98
n ≥ 1 there exists an isonormal process Wn,P , such that for all η1 > 0, η2 > 0
supP∈P
P (
√n
Kn‖Gn,P −Wn,P ‖FP,n,δn ≥ η1 +
√η1√η2(C1Sn + 1))
. Nn exp−C2η1+ n exp−C2η2, (S.314)
for some C2 > 0. Since Nn ↑ ∞, (S.314) implies for any ε > 0 there are C3 > 0, C4 > 0
sufficiently large, such that setting η1 ≡ C3 log(Nn) and η2 ≡ C3 log(n) yields
supP∈P
P (‖Gn,P −Wn,P ‖FP,n,δn ≥ C4Kn ×log(nNn) +
√log(Nn) log(n)Sn√n
) < ε. (S.315)
Next, note that by definition of FP,n,δn , there exists a Γn,P : Fn → FP,n,δn such that
supP∈P supf∈Fn ‖f −Γn,P f‖P,2 ≤ δn. Let D(ε,Fn, ‖ · ‖P,2) denote the ε-packing number
for Fn under ‖ · ‖P,2, and note D(ε,Fn, ‖ · ‖P,2) ≤ N[ ](ε,Fn, ‖ · ‖P,2). Therefore, by
Corollary 2.2.8 in van der Vaart and Wellner (1996) we can conclude that
supP∈P
EP [‖Wn,P −Wn,P Γn,P ‖Fn ]
. supP∈P
∫ δn
0
√logD(ε,Fn, ‖ · ‖P,2)dε ≤ sup
P∈PJ[ ](δn,Fn, ‖ · ‖P,2) ≡ Jn. (S.316)
Similarly, employing Lemma 3.4.2 in van der Vaart and Wellner (1996) yields that
supP∈P
EP [‖Gn,P −Gn,P Γn,P ‖Fn ]
. supP∈P
J[ ](δn,Fn, ‖ · ‖P,2)(1 + supP∈P
J[ ](δn,Fn, ‖ · ‖P,2)Kn
δ2n
√n
) ≡ Jn(1 +JnKn
δ2n
√n
) (S.317)
Therefore, combining (S.315), (S.316), and (S.317) together with the decomposition
Therefore, since ∆i,k(P ) = ∆i+1,2k(P )∪∆i+1,2k+1(P ), it follows thatQP (∆i+1,2k+1(P )) =12QP (∆i,k(P )) for 0 ≤ k ≤ 2i − 1 as well. In particular, QP (∆0,0(P )) = 1 implies that
QP (∆i,k(P )) =1
2i(S.324)
for any integers i ≥ 1 and 0 ≤ k ≤ 2i − 1. Moreover, we note that result (S.319) and
Assumptions S.8.1(ii) and S.8.2(ii) together imply that the density gP of QP satisfies
0 < infP∈P
infa∈[0,1]dv
gP (a) < supP∈P
supa∈[0,1]dv
gP (a) <∞, (S.325)
and therefore QP (A) λ(A) uniformly in A ∈ A and P ∈ P. Hence, since by (S.322)
ui+1,2k,j(P ) = ui,k,j(P ) and li+1,2k,j(P ) = li,k,j(P ) for all j 6= m(i+ 1), we obtain
(ui+1,2k,m(i+1)(P )− li+1,2k,m(i+1)(P ))
(ui,k,m(i+1)(P )− li,k,m(i+1)(P ))=
∏dvj=1(ui+1,2k,j(P )− li+1,2k,j(P ))∏dv
j=1(ui,k,j(P )− li,k,j(P ))
=λ(∆i+1,2k(P ))
λ(∆i,k(P ))QP (∆i+1,2k(P ))
QP (∆i,k(P ))=
1
2(S.326)
uniformly in P ∈ P, i ≥ 0, and 0 ≤ k ≤ 2i−1 by results (S.324) and (S.325). Moreover,
by identical arguments but using (S.323) instead of (S.322) we conclude
(ui+1,2k+1,m(i+1)(P )− li+1,2k+1,m(i+1)(P ))
(ui,k,m(i+1)(P )− li,k,m(i+1)(P )) 1
2(S.327)
also uniformly in P ∈ P, i ≥ 0 and 0 ≤ k ≤ 2i − 1. Thus, since (ui+1,2k,j(P ) −li+1,2k,j(P )) = (ui+1,2k+1,j(P )−li+1,2k+1,j(P )) = (ui,k,j(P )−li,k,j(P )) for all j 6= m(i+1),
and u0,0,j(P ) − l0,0,j(P ) = 1 for all 1 ≤ j ≤ dv we obtain from m(i) = i − b i−1dvc × dv,
results (S.326) and (S.327), and proceeding inductively that
(ui,k,j(P )− li,k,j(P )) 2−idv , (S.328)
101
uniformly in P ∈ P, i ≥ 0, 0 ≤ k ≤ 2i − 1, and 1 ≤ j ≤ dv. Thus, result (S.328) yields
supP∈P
max0≤k≤2i−1
supa,a′∈∆i,k(P )
‖a− a′‖
≤ supP∈P
max0≤k≤2i−1
max1≤j≤dv
√dv × (ui,j,k(P )− li,j,k(P )) = O(2−
idv ). (S.329)
We next obtain the desired sequence of partitions ∆i(P ) of (Ω(P ),BP , P ) by con-
structing them from the partition ∆i,k(P ) of [0, 1]dv . To this end, set
∆i,k(P ) ≡ TP (∆i,k(P ))
for all i ≥ 0 and 0 ≤ k ≤ 2i − 1. Note that ∆i(P ) satisfies conditions (i) and (ii) due
to T−1P being a measurable map, TP being bijective, and result (S.321). In addition,
∆i(P ) satisfies condition (iii) since by definition (S.318) and result (S.324) we have
P (∆i,k(P )) = P (TP (∆i,k(P ))) = QP (∆i,k(P )) = 2−i,
for all 0 ≤ k ≤ 2i−1. Moreover, by Assumption S.8.2(ii), supP∈P supa∈[0,1]dv ‖JTP (a)‖o <∞, and hence by the mean value theorem we can conclude that
supP∈P
max0≤k≤2i−1
supv,v′∈∆i,k(P )
‖v − v′‖ = supP∈P
max0≤k≤2i−1
supa,a′∈∆i,k(P )
‖TP (a)− TP (a′)‖
. supP∈P
max0≤k≤2i−1
supa,a′∈∆i,k(P )
‖a− a′‖ = O(2−idv ),
by result (S.329), which verifies that ∆i(P ) satisfies condition (iv). Also note that to
verify ∆i(P ) satisfies condition (v) it suffices to show that⋃i≥0 ∆i(P ) generates the
Borel σ-algebra on Ω(P ). To this end, we first aim to show that
A = σ(⋃i≥0
∆i(P )), (S.330)
where for a collection of sets C, σ(C) denotes the σ-algebra generated by C. For any
closed set A ∈ A, then define Di(P ) to be given by
Di(P ) ≡⋃
k:∆i,k(P )∩A 6=∅
∆i,k(P ).
Notice that since ∆i(P ) is a partition of [0, 1]dv , A ⊆ Di(P ) for all i ≥ 0 and hence A ⊆⋂i≥0Di(P ). Moreover, if a0 ∈ Ac, then Ac being open and (S.329) imply a0 /∈ Di(P )
for i sufficiently large. Hence, Ac ∩ (⋂i≥0Di(P )) = ∅ and therefore A =
⋂i≥0Di(P ). It
follows that if A is closed, then A ∈ σ(⋃i≥0 ∆i(P )), which implies A ⊆ σ(
⋃i≥0 ∆i(P )).
On the other hand, since ∆i,k(P ) is Borel for all i ≥ 0 and 0 ≤ k ≤ 2i − 1, we also have
102
σ(⋃i≥0 ∆i(P )) ⊆ A, and hence (S.330) follows. To conclude, we then note that
σ(⋃i≥0
∆i(P )) = σ(⋃i≥0
TP (∆i(P ))) = TP (σ(⋃i≥0
∆i(P ))) = TP (A), (S.331)
by Corollary 1.2.9 in Bogachev (2007). However, TP and T−1P being continuous im-
plies TP (A) equals the Borel σ-algebra in Ω(P ), and therefore (S.331) implies ∆i(P )satisfies condition (v) establishing the lemma.
Lemma S.8.2. Let ∆i(P ) be as in Lemma S.8.1, and BP,i denote the σ-algebra gener-
ated by ∆i(P ). If Assumptions S.8.1(i)(ii) and S.8.2(i)(ii) hold, then there are K0 > 0,
K1 > 0 such that for all P ∈ P and any f satisfying f ∈ L2P for all P ∈ P:
EP [(f(V )− EP [f(V )|BP,i])2] ≤ K0 ×$2(f,K1 × 2−idv , P ).
Proof: Since ∆i(P ) is a partition of Ω(P ) and P (∆i,k(P )) = 2−i for all i ≥ 0 and
0 ≤ k ≤ 2i − 1, we may express EP [f(V )|BP,i] as an element of L2P by
EP [f(V )|BP,i] = 2i2i−1∑k=0
1v ∈ ∆i,k(P )∫
∆i,k(P )f(v)dP (v).
Hence, employing that P (∆i,k(P )) = 2−i for all i ≥ 0 and 0 ≤ k ≤ 2i − 1 together
with ∆i(P ) being a partition of Ω(P ), and applying Holder’s inequality to the term
(f(v)− f(v))1v ∈ Ω(P )1v ∈ ∆i,k(P ) we can conclude that
EP [(f(V )− EP [f(V )|BP,i])2]
=2i−1∑k=0
∫∆i,k(P )
(f(v)− 2i∫
∆i,k(P )f(v)dP (v))2dP (v)
=2i−1∑k=0
22i
∫∆i,k(P )
(
∫∆i,k(P )
(f(v)− f(v))1v ∈ Ω(P )dP (v))2dP (v)
≤2i−1∑k=0
22iP (∆i,k(P ))
∫∆i,k(P )
∫∆i,k(P )
(f(v)− f(v))21v ∈ Ω(P )dP (v)dP (v)
=2i−1∑k=0
2i∫
∆i,k(P )
∫∆i,k(P )
(f(v)− f(v))21v ∈ Ω(P )dP (v)dP (v).
Let Di ≡ supP∈P max0≤k≤2i−1 diam∆i,k(P ), where diam∆i,k(P ) is the diameter
of ∆i,k(P ). Further note that by Lemma S.8.1(iv), Di = O(2−idv ) and hence we have
λ(s ∈ Rdv : ‖s‖ ≤ Di) ≤M12−i for some M1 > 0 and λ the Lebesgue measure. Noting
that supP∈P supv∈Ω(P )dPdλ (v) < ∞ by Assumption S.8.1(ii), and doing the change of
103
variables s = v − v we then obtain for some constant M0 > 0 that
EP [(f(V )− EP [f(V )|BP,i])2]
≤M0
2i−1∑k=0
2i∫
∆i,k(P )
∫∆i,k(P )
(f(v)− f(v))21v ∈ Ω(P )dλ(v)dλ(v)
≤M0M1 sup‖s‖≤Di
2i−1∑k=0
∫∆i,k(P )
(f(v + s)− f(v))21v + s ∈ Ω(P )dλ(v) . (S.332)
Hence, since ∆i,k(P ) : k = 0 . . . 2i − 1 is a partition of Ω(P ), $(f, h, P ) is decreasing
in h, and Di ≤ K12−idv for some K1 > 0 by Lemma S.8.1(iv), we obtain
EP [(f(V )− EP [f(V )|BP,i])2] ≤M0M1 ×$2(f,K1 × 2−idv , P ) (S.333)
by (S.332). Setting K0 ≡M0 ×M1 in (S.333) then establishes the lemma.
S.9 Uniform Bootstrap Coupling
We next provide uniform coupling results for the multiplier bootstrap that allow us to
verify Assumption 3.14 in a variety of problems. The results in this appendix may be of
independent interest, as they extend the validity of the multiplier bootstrap to suitable
non-Donsker classes Fn. For this reason, as in Section S.8, we state the results in a
notation that abstracts from the rest of the paper. Hence, here V ∈ Rdv should be
interpreted as a generic random variable whose distribution is given by P ∈ P.
Our coupling results rely on a series approximation to the elements of Fn. To this
end, we will assume that for each P ∈ P there is a basis fd,n,P dnd=1, with dn possibly
diverging to infinity, that provides a suitable approximation to every f ∈ Fn. Formally,
for fdnn,P (v) ≡ (f1,n,P (v), . . . , fdn,n,P (v))′, we impose the following:
Assumption S.9.1. For each P ∈ P there is an array of functions fd,n,P dnd=1 ⊂ L2P
such that: (i) The eigenvalues of EP [fdnn,P (V )fdnn,P (V )′] are bounded by 1 ≤ Cn uniformly
in P ∈ P; (ii) supP∈P max1≤d≤dn ‖fd,n,P ‖P,∞ ≤ Kn with 1 ≤ Kn finite.
Assumption S.9.2. For every f ∈ Fn and P ∈ P there is a βn,P (f) ∈ Rdn such that:
(i) The class Gn,P ≡ f − fdn′n,Pβn,P (f) : f ∈ Fn has envelope Gn,P which satisfies
‖g‖P,2 ≤ δn‖Gn,P ‖P,2 for all P ∈ P, g ∈ Gn,P , and some δn > 0 with
J1n ≡ supP∈PJ[ ](δn‖Gn,P ‖P,2,Gn,P , ‖ · ‖P,2) +
√nEP [Gn,P (V ) exp−
nδ2n‖Gn,P ‖2P,2
G2n,P (V )ηn,P
]
finite and ηn,P ≡ 1 + logN[ ](δn‖Gn,P ‖P,2,Gn,P , ‖ · ‖P,2); (ii) The set Bn ≡ βn,P (f) :
f ∈ Fn, P ∈ P ∪ 0 satisfies J2n ≡∫∞
0
√log(N(ε,Bn, ‖ · ‖2))dε <∞.
104
Assumption S.9.1 imposes our regularity conditions on the approximating functions
fd,n,P dnd=1. We emphasize that the functions fd,n,P dnd=1 need not be known as they are
merely employed in the theoretical construction of the bootstrap coupling, and not in
the computation of the multiplier bootstrap process Wn. In certain application, such as
when Fn itself is finite dimensional, a basis fd,n,P dnd=1 may be naturally available.The
approximating requirements of fd,n,P dnd=1 are formally imposed in Assumption S.9.2.
In particular, Assumption S.9.2(i) requires that the remainder of the approximation of
Fn by fd,n,P dnd=1 not be “too large.” Intuitively, Assumption S.9.2(i) may be under-
stood as controlling the “bias” in a series approximation of Fn by linear combinations of
fd,n,P dnd=1. Assumption S.9.2(ii) in turn controls the “variance” of the series approxi-
mation by demanding the class of approximating functions to have a finite entropy. As
in Section S.8, Assumption S.9.2 does not require the functions f ∈ Fn to be “smooth”
in the sense of being differentiable.
We next show Assumptions S.9.1 and S.9.2 suffice for coupling Wn to W?n,P .
Theorem S.9.1. Let Assumptions S.9.1, S.9.2 hold, (ωi, Vi)ni=1 be i.i.d. with Vi ∼P ∈ P, ωi ∼ N(0, 1), ωi and Vi independent, and suppose that dn log(1 + dn)K2
n = o(n).
(i) Then, there is a linear Gaussian process W?n,P independent of Vini=1 with
‖Wn −W?n,P ‖Fn = OP (J2n
K2nCndn log(1 + dn)
n1/4 + J1n)
uniformly in P ∈ P. (ii) If in addition the eigenvalues of EP [fdnn,P (V )fdnn,P (V )′] are
bounded away from zero uniformly in n and P ∈ P, then uniformly in P ∈ P
‖Wn −W?n,P ‖Fn = OP (
J2nKn
√Cndn log(1 + dn)√
n+ J1n).
Theorem S.9.1(i) derives a rate of convergence for the coupled process, while Theorem
S.9.1(ii) improves on the rate under the additional requirement that EP [fdnn,P (V )fdnn,P (V )′]
be bounded away from singularity. The rates of both Theorems S.9.1(i) and S.9.1(ii)
depend on the selected sequence dn, which should be chosen optimally to deliver the
best possible implied rate. Heuristically, the proof of Theorem S.9.1 proceeds in two
steps. First, we construct a multivariate normal random variable W?n,P (fdnn,P ) ∈ Rdn
that is coupled with Wn(fdnn,P ) ∈ Rdn , and then employ the linearity of Wn to obtain a
suitable coupling on the subspace Sn,P ≡ spanfdnn,P . Second, we employ Assumption
S.9.2(i) to show that a successful coupling on Sn,P leads to the desired construction since
Fn is well approximated by fd,n,P dnd=1. We note that while we do not pursue it here
for conciseness, the outlined heuristics can also be employed to verify Assumption 3.7
by coupling Gn,P (fdnn,P ) to Wn,P (fdnn,P ) through standard results (e.g. Yurinskii (1977)).
Below, we include the proof of Theorem S.9.1 and auxiliary results.
105
Proof of Theorem S.9.1: We proceed by employing Lemma S.9.1 to couple Wn on
a finite dimensional subspace, and then showing that such a result suffices for coupling
Wn and W?n,P on Fn. To this end, let Sn,P ≡ spanfdnn,P and note that Assumption
S.9.2(ii) and Lemma S.9.1 imply that there exists a linear Gaussian process W(1)n,P on
Sn,P and a sequence Rn = o(1) such that uniformly in P ∈ P we have
supβ∈Bn
|Wn(fdn′n,Pβ)−W(1)n,P (fdn′n,Pβ)| = OP (J2nRn). (S.334)
To establish part (i) of the theorem we will set Rn = (dn log(1 + dn)CnK2n/n)1/4
and employ Lemma S.9.1(i), while to establish part (ii) we will set Rn = (dn log(1 +
dn)CnK2n/n)1/2 and employ Lemma S.9.1(ii) instead.
For any closed linear subspace An,P of L2P , let Projf |An,P denote the ‖ · ‖P,2 pro-
jection of f onto An,P and set A⊥n,P ≡ f ∈ L2P : f = g−Projg|An,P for some g ∈ L2
P (i.e. A⊥n,P is the orthocomplement of An,P in L2
P ). Assuming the underlying probability
space is suitably enlarged to carry a linear isonormal process W(2)n,P on S⊥n,P independent
of W(1)n,P and Vini=1, we then define W?
n,P on L2P pointwise by
W?n,P f ≡W(1)
n,P (Projf |Sn,P ) + W(2)n,P (Projf |S⊥n,P ),
which is linear in f by linearity of f 7→ Projf |Sn,P , W(1)n,P , and W(2)
n,P . Moreover, since
W?n,P is sub-Gaussian with respect to ‖ · ‖P,2, it follows from Corollary 2.2.8 in van der
Vaart and Wellner (1996), N(δn‖Gn,P ‖P,2,Gn,P , ‖·‖P,2) = 1 due to ‖g‖P,2 ≤ δn‖Gn,P ‖P,2for all g ∈ Gn,P and P ∈ P, bracketing numbers being larger than covering numbers,
Jensen’s inequality, and the definition of J1n in Assumption S.9.2(i) that
form a bracket for Gn,P . Moreover, since E[ω2] = 1 and ω and V are independent,
it follows from direct calculation that ‖gi,u,P − gi,l,P ‖P,2 = ‖gi,u,P − gi,l,P ‖P,2. Setting
Gn,P (ω, v) ≡ |ω|Gn,P (v), then note that Gn,P is an envelope for Gn,P , which satisfies
‖Gn,P ‖P,2 = ‖Gn,P ‖P,2. Recalling ηn,P ≡ 1 + logN[ ](δn‖Gn,P ‖P,2,Gn,P , ‖ · ‖P,2) we then
obtain by Theorem 2.14.2 in van der Vaart and Wellner (1996) that
EP [ supg∈Gn,P
| 1√n
n∑i=1
ωig(Vi)|] . J[ ](δn‖Gn,P ‖P,2,Gn,P , ‖ · ‖P,2)
+√nEP [|ω|Gn,P (V )1|ω|
Gn,P (V )
‖Gn,P ‖P,2>
√nδn√ηn,P]. (S.337)
Moreover, since ω follows a standard normal distribution, we have E[|ω|1|ω| > a] .exp−a2/2 for any a ≥ 0. Therefore, the independence of ω and V implies
EP [|ω|Gn,P (V )1|ω|Gn,P (V )
‖Gn,P ‖P,2>
√nδn√ηn,P] . EP [Gn,P (V ) exp−
nδ2n‖Gn,P ‖2P,2
2G2n,P (V )ηn,P
]
which together with result (S.337) and the definition of J1n in Assumption S.9.2(i) yields
EP [ supg∈Gn,P
| 1√n
n∑i=1
ωig(Vi)|] . J1n. (S.338)
Moreover, by Lemma 2.3.1 in van der Vaart and Wellner (1996) we further obtain that
EP [ supg∈Gn,P
| 1√n
n∑i=1
g(Vi)− EP [g(V )]|] + δn‖Gn,P ‖P,2
≤ EP [ supg∈Gn,P
| 2√n
n∑i=1
ωig(Vi)|] + δn‖Gn,P ‖P,2 . J1n, (S.339)
where the final inequality follows from (S.338) and the definition of J1n. Thus, (S.336),
(S.338), and (S.339) together with Markov’s inequality imply that uniformly in P ∈ P
‖Wn‖Gn,P = OP (J1n). (S.340)
107
Next, we use the linearity of the processes f 7→ Wn(f) and f 7→W?n,P (f) to obtain that
‖Wn−W?n,P ‖Fn ≤ sup
f∈Fn|Wn(fdn′n,Pβn,P (f))−W?
n,P (fdn′n,P (βn,P (f)))|+‖Wn−W?n,P ‖Gn,P
≤ supβ∈Bn
|Wn(fdn′n,Pβ)−W?n,P (fdn′n,Pβ)|+OP (J1n) = OP (J2nRn + J1n),
where the second inequality holds uniformly in P ∈ P by (S.335) and Markov’s inequal-
ity, result (S.340), and set inclusion, while the equality holds uniformly in P ∈ P by
result (S.334). The first claim of the theorem then follows by using Lemma S.9.1(i) to
set Rn = (dn log(1 + dn)CnK2n/n)1/4 in (S.334), and the second part of the theorem
follows from using Lemma S.9.1(ii) to set Rn = (dn log(1 + dn)CnK2n/n)1/2 instead.
Lemma S.9.1. Let (ωi, Vi)ni=1 be i.i.d. with Vi ∼ P ∈ P, ωi ∼ N(0, 1), and ωi and Vi
√log(N(ε,Bn, ‖ · ‖2))dε <∞. (i) Then, there is a linear
Gaussian process W?n,P on Sn,P ≡ spanfdnn,P independent of Vini=1 with
supβ∈Bn
|Wn(fdn′n,Pβ)−W?n,P (fdn′n,Pβ)| = OP (J2n
dn log(1 + dn)CnK2n
n1/4)
uniformly in P ∈ P. (ii) If in addition the eigenvalues of EP [fdnn,P (V )fdnn,P (V )′] are
bounded away from zero uniformly in n and P ∈ P, then uniformly in P ∈ P
supβ∈Bn
|Wn(fdn′n,Pβ)−W?n,P (fdn′n,Pβ)| = OP (
J2n
√dn log(1 + dn)CnKn√
n).
Proof: First note that Wn(f−c) = Wnf for any c ∈ R and f ∈ L2P . We may therefore
assume without loss of generality that EP [fdnn,P (V )] = 0, and for every P ∈ P we let
Σn(P ) ≡ VarP fdnn,P (V ) = EP [fdnn,P (V )fdnn,P (V )′] and define
Σn(P ) ≡ 1
n
n∑i=1
(fdnn,P (Vi)−1
n
n∑j=1
fdnn,P (Vj))(fdnn,P (Vi)−
1
n
n∑j=1
fdnn,P (Vj))′.
For a sequence Rn with Rn = o(1), and any constant M > 0 and P ∈ P define the event
An,P (M) ≡ ‖Σ1/2n (P )− Σ1/2
n (P )‖o,2 ≤MRn. (S.341)
Further note that by Lemma S.9.2 it follows we may select Rn = o(1) such that we have
lim infM↑∞
lim infn→∞
infP∈P
P (Vini=1 ∈ An,P (M)) = 1. (S.342)
In particular, to establish part (i) of the lemma we will setRn = (dn log(1+dn)CnK2n/n)1/4
and employ Lemma S.9.2(i), while to establish part (ii) we will set Rn = (dn log(1 +
108
dn)CnK2n/n)1/2 and employ Lemma S.9.2(ii) instead.
Next, let Ndn ∈ Rdn follow a standard normal distribution and be independent
of (ωi, Vi)ni=1 (defined on the same suitably enlarged probability space). Further let
νddnd=1 denote eigenvectors of Σn(P ), λddnd=1 represent the corresponding (possibly
zero) eigenvalues and define the random variable Zn,P ∈ Rdn to be given by
Zn,P ≡∑d:λd 6=0
νd(ν ′dWn(fdnn,P ))
λ1/2d
+∑d:λd=0
νd(ν′dNdn). (S.343)
Then note that since Wn(fdnn,P ) ∼ N(0, Σn(P )) conditional on Vini=1, and Ndn is inde-
pendent of (ωi, Vi)ni=1, Zn,P is Gaussian conditional on Vini=1. Furthermore,
E[Zn,PZ′n,P |Vini=1] =
dn∑d=1
νdν′d = Idn
by direct calculation for Idn the dn × dn identity matrix. Hence, Zn,P ∼ N(0, Idn)
conditional on Vini=1 almost surely in Vini=1 and is thus independent of Vini=1.
Moreover, we also note that by Theorem 3.6.1 in Bogachev (1998) and Wn(fdnn,P ) ∼N(0, Σn(P )) conditional on Vini=1, it follows that Wn(fdnn,P ) belongs to the range of
Σn(P ) : Rdn → Rdn almost surely in (ωi, Vi)ni=1. Therefore, since νd : λd 6= 0dnd=1
spans the range of Σn(P ), we conclude from (S.343) that for any β ∈ Rdn
β′Σ1/2n (P )Zn,P = β′
∑d:λd 6=0
νd(ν′dWn(fdnn,P )) = Wn(β′fdnn,P ).
Analogously, we define for any β ∈ Rdn the linear Gaussian process W?n,P on Sn,P by
W?n,P (β′fdnn,P ) ≡ β′Σ1/2
n (P )Zn,P ,
which is independent of Vini=1 due to the independence of Zn,P and Vini=1. Setting
Wn,P (β) ≡ |β′(Σ1/2n (P )− Σ1/2
n (P ))Zn,P |1An,P (M), (S.344)
then note that for 1An,P (M) the indicator for the event Vini=1 ∈ An,P (M), we have
supβ∈Bn
|Wn(fdn′n,Pβ)−W?n,P (fdn′n,Pβ)|1An,P (M)
= supβ∈Bn
|β′(Σ1/2n (P )− Σ1/2
n (P ))Zn,P |1An,P (M) = supβ∈Bn
|Wn,P (β)|. (S.345)
Since Bn is bounded due to J2n being finite, definition (S.344) implies Wn,P ∈ `∞(Bn).
Moreover, we note that conditional on Vini=1, Wn,P is sub-Gaussian under the semi-
metric ρn(β, β) ≡ ‖(Σ1/2n (P )−Σ
1/2n (P ))(β−β)‖2. Since ‖Σ1/2
n (P )−Σ1/2n (P )‖o,2 ≤MRn
109
whenever 1An,P (M) = 1, we additionally obtain that∫ ∞0
√log(N(ε,Bn, ρn))dε ≤
∫ ∞0
√log(N(ε/MRn,Bn, ‖ · ‖2))dε
= MRn
∫ ∞0
√log(N(u,Bn, ‖ · ‖2))du, (S.346)
where the equality follows from the change of variables ε = MRnu. Therefore, since
0 ∈ Bn, Corollary 2.2.8 in van der Vaart and Wellner (1996) and (S.346) imply
E[ supβ∈Bn
|Wn,P (β)||Vini=1] .MRn
∫ ∞0
√log(N(u,Bn, ‖ · ‖2))du ≡MRnJ2n. (S.347)
Next, note (S.344), (S.345), and (S.347) together with Markov’s inequality imply that
where the final inequality holds for any cjj?(ε)j=jn
satisfying cj > 0 and∑j?(ε)
j=jncj = 1, and
the product should be understood to equal one if j?(ε) = jn − 1. In particular, setting
cj = j−γ/(∑j?(ε)
j=jnj−γ) in (S.359) we obtain the upper bound
N(ε,An, ‖ · ‖`1) ≤ (d2Mn(
∑∞j=jn
j−γ)
εe)(j?n(ε)−jn)∨0
≤ (d2Mn(jn − 1)−(γ−1)
ε(γ − 1)e)(j?n(ε)−jn)∨0
where the final inequality follows from an integral bound as in (S.357). Thus, the claim
of the lemma follows from the bound dae ≤ a+ 1 and results (S.358) and (S.359).
Lemma S.9.4. For any positive random variable U with E[U2] <∞ and finite constant
A > 0 it follows that E[U exp− AU2 ] ≤ E[U ] exp− A
E[U2]+ E[U2]/
√2A.
Proof: First note that the function u 7→ u exp−A/u2 is convex on u ∈ (0,√
2A].
Therefore by applying Jensen’s inequality, the fact that the function u 7→ u exp−A2/u2is increasing in u ∈ (0,∞), E[10 < U <
√2AU ] ≤ E[U ] due to U being positive almost
surely, and exp−A/U2 ≤ 1 due to A > 0, we can conclude that
E[U exp− A
U2] = E[10 < U ≤
√2AU exp− A
U2] + E[1U >
√2AU exp− A
U2]
≤ E[U ] exp− A
E[U2]+ E[1U >
√2AU ].
The claim of the lemma therefore follows from E[1U >√
2AU ] ≤ E[U2]/√
2A by the
Cauchy Schwarz inequality and Markov’s inequality.
References
Aliprantis, C. D. and Border, K. C. (2006). Infinite Dimensional Analysis – A Hitchhiker’sGuide. Springer-Verlag, Berlin.
Belloni, A., Chernozhukov, V., Chetverikov, D. and Kato, K. (2015). Some new asymp-totic theory for least squares series: Pointwise and uniform results. Journal of Econometrics,186 345–366.
Bhatia, R. (1997). Matrix Analysis. Springer, New York.
Blundell, R., Chen, X. and Kristensen, D. (2007). Semi-nonparametric iv estimation ofshape-invariant engel curves. Econometrica, 75 1613–1669.
113
Blundell, R., Horowitz, J. and Parey, M. (2017). Nonparametric estimation of a nonsep-arable demand function under the slutsky inequality restriction. Review of Economics andStatistics, 99 291–304.
Blundell, R., Horowitz, J. L. and Parey, M. (2012). Measuring the price responsivenessof gasoline demand: Economic shapre restrictions and nonparametric demand estimation.Quantitative Economics, 3 29–51.
Bogachev, V. I. (1998). Gaussian Measures. American Mathematical Society, Providence.
Bogachev, V. I. (2007). Measure Theory: Volume I. Springer-Verlag, Berlin.
Chen, X. (2007). Large sample sieve estimation of semi-nonparametric models. In Handbook ofEconometrics 6B (J. J. Heckman and E. E. Leamer, eds.). North Holland, Elsevier.
Chen, X. and Christensen, T. M. (2018). Optimal sup-norm rates and uniform inference onnonlinear functionals of nonparametric iv regression. Quantitative Economics, 9 39–84.
Chen, X. and Pouzo, D. (2012). Estimation of nonparametric conditional moment modelswith possibly nonsmooth generalized residuals. Econometrica, 80 277–321.
DeVore, R. A. and Lorentz, G. G. (1993). Constructive approximation, vol. 303. SpringerScience & Business Media.
Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators.Econometrica, 50 891–916.
Koltchinskii, V. I. (1994). Komlos-major-tusnady approximation for the general empiricalprocess and haar expansions of classes of functions. Journal of Theoretical Probability, 773–118.
Kress, R. (1999). Linear Integral Equations. Springer, New York.
Luenberger, D. G. (1969). Optimization by Vector Space Methods. Wiley, New York.
Massart, P. (1989). Strong approximation for multivariate empirical and related processes,via kmt constructions. The Annals of Probability, 17 266–291.
Matzkin, R. L. (1994). Restrictions of economic theory in nonparametric methods. In Handboodof Econometrics (R. Engle and D. McFadden, eds.), vol. IV. Elsevier.
Newey, W. K. (1997). Convergence rates and asymptotic normality for series estimators.Journal of Econometrics, 79 147–168.
Pollard, D. (2002). A user’s guide to measure theoretic probability, vol. 8. Cambridge Uni-versity Press.
Tropp, J. A. (2012). User-friendly tail bounds for sums of random matrices. Foundations ofComputational Mathematics, 4 389–434.
van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and EmpiricalProcesses: with Applications to Statistics. Springer, New York.
Walkup, D. W. and Wets, R. J.-B. (1969). A lipschitzian characterization of convex poly-hedra. Proceedings of the American Mathematical Society 167–173.
Yurinskii, V. V. (1977). On the error of the gaussian approximation for convolutions. Theoryof Probability and Its Applications, 2 236–247.
Zeidler, E. (1985). Nonlinear Functional Analysis and its Applications I. Springer-Verlag,Berlin.
Zhai, A. (2018). A high-dimensional clt in w2 distance with near optimal convergence rate.Probability Theory and Related Fields, 170 821–845.