Simple Adaptive Size-Exact Testing for Full-Vector and Subvector Inference in Moment Inequality Models Gregory Cox Xiaoxia Shi June 10, 2021 Abstract We propose a simple test for moment inequalities that has exact size in normal mod- els with known variance and has uniformly asymptotically exact size under asymptotic normality. The test compares the quasi-likelihood ratio statistic to a chi-squared criti- cal value, where the degree of freedom is the rank of the inequalities that are active in finite samples. The test requires no simulation and thus is computationally fast and especially suitable for constructing confidence sets for parameters by test inversion. It uses no tuning parameter for moment selection and yet still adapts to the slackness of the moment inequalities. Furthermore, we show how the test can be easily adapted to inference on subvectors in the common empirical setting of conditional moment in- equalities with nuisance parameters entering linearly. User-friendly Matlab code to implement the test is provided. Keywords: Moment Inequalities, Uniform Inference, Likelihood Ratio, Subvector Infer- ence, Convex Polyhedron, Linear Programming We acknowledge helpful feedback from Donald Andrews, Isaiah Andrews, Xiaohong Chen, Whitney Newey, Adam Rosen, Jonathan Roth, Matthew Shum, J¨ org Stoye, the participants of the 2nd Economet- rics Jamboree at the UC Berkeley, the 2nd CEMMAP UCL/Vanderbilt Joint Conference, the 2020 World Congress of the Econometric Society, the 2021 Winter Meeting of the Econometric Society, and economet- rics seminars at Columbia University, the National University of Singapore, the UCSD, the UCLA, and the University of Wisconsin at Madison. Department of Economics, National University of Singapore ([email protected]) Department of Economics, University of Wisconsin-Madison ([email protected]) 1
103
Embed
Simple Adaptive Size-Exact Testing for Full-Vector and ...
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Simple Adaptive Size-Exact Testing for Full-Vector and
Subvector Inference in Moment Inequality Models*
Gregory Cox Xiaoxia Shi
June 10, 2021
Abstract
We propose a simple test for moment inequalities that has exact size in normal mod-
els with known variance and has uniformly asymptotically exact size under asymptotic
normality. The test compares the quasi-likelihood ratio statistic to a chi-squared criti-
cal value, where the degree of freedom is the rank of the inequalities that are active in
finite samples. The test requires no simulation and thus is computationally fast and
especially suitable for constructing confidence sets for parameters by test inversion. It
uses no tuning parameter for moment selection and yet still adapts to the slackness of
the moment inequalities. Furthermore, we show how the test can be easily adapted
to inference on subvectors in the common empirical setting of conditional moment in-
equalities with nuisance parameters entering linearly. User-friendly Matlab code to
implement the test is provided.
Keywords: Moment Inequalities, Uniform Inference, Likelihood Ratio, Subvector Infer-
ence, Convex Polyhedron, Linear Programming
*We acknowledge helpful feedback from Donald Andrews, Isaiah Andrews, Xiaohong Chen, WhitneyNewey, Adam Rosen, Jonathan Roth, Matthew Shum, Jorg Stoye, the participants of the 2nd Economet-rics Jamboree at the UC Berkeley, the 2nd CEMMAP UCL/Vanderbilt Joint Conference, the 2020 WorldCongress of the Econometric Society, the 2021 Winter Meeting of the Econometric Society, and economet-rics seminars at Columbia University, the National University of Singapore, the UCSD, the UCLA, and theUniversity of Wisconsin at Madison.
Department of Economics, National University of Singapore ([email protected])Department of Economics, University of Wisconsin-Madison ([email protected])
In the past decade or so, inequality testing has become a mainstream inference method used
for models where standard maximum likelihood or method of moments are difficult to use,
for reasons including multiple equilibria, incomplete data, or complicated dynamic patterns.
In such models, inequalities can often be derived from equilibrium conditions and rational
decision making. Inference can then be conducted by inverting tests for these inequalities
at each given parameter value. That is, by testing the inequalities at each parameter value
and collecting the values at which the test does not reject to form a confidence set.1
Although conceptually simple, conducting inference via test inversion poses considerable
computational challenges to practitioners. This is because, in order to get an accurate
calculation of the confidence set, one needs to test the inequalities at a set of parameter
values that is dense enough in the parameter space. Depending on the application, the
number of values that need to be tested can be astronomical and increases exponentially
with the dimension of the parameter space. Moreover, existing tests often require simulated
critical values that are nontrivial to compute even for a single value of the parameter, let
alone repeated for a large number of parameter values.2
Besides computational challenges, most existing methods for moment inequality models
involve tuning parameter sequences that are required to diverge at a certain rate as the sam-
ple size increases. The threshold in the generalized moment selection procedures (e.g. Rosen
(2008) and Andrews and Soares (2010)) and the subsample size in subsampling-based meth-
ods (e.g. Chernozhukov et al. (2007) and Romano and Shaikh (2012)) are notable examples.3
Appropriate choices often depend on data in complicated ways, and an inappropriate choice
can threaten the validity of the test.
Clearly, there are two ways to ease the computational burden: one is to make the in-
equality test easier for each parameter value, and the other is to reduce the number of
parameter values that need to be tested. We contribute to the literature in both. First, we
1An incomplete list of applications that use inequalities as estimation restrictions includes Tamer (2003),Uhlig (2005), Bajari et al. (2007), Blundell et al. (2007), Ciliberto and Tamer (2009), Beresteanu et al.(2011), Holmes (2011), Baccara et al. (2012), Chetty (2012), Nevo and Rosen (2012), Kawai and Watanabe(2013), Eizenberg (2014), Huber and Mellace (2015), Pakes et al. (2015), Magnolfi and Roncoroni (2016),Sheng (2016), Sullivan (2017), He (2017), Iaryczower et al. (2018), Wollman (2018), Fack et al. (2019), andMorales et al. (2019). For a recent overview of the literature, see for example Ho and Rosen (2017), Canayand Shaikh (2017), and Molinari (2020).
2Existing tests for general moment inequalities with simulated critical values include Chernozhukov et al.(2007), Romano and Shaikh (2008), Andrews and Guggenberger (2009), Andrews and Soares (2010), Bugni(2010), Canay (2010), Romano and Shaikh (2012), and Romano et al. (2014). See Canay and Shaikh (2017)and Molinari (2020) for more references.
3Arguably, the size of a first stage confidence set or the number of simulation/bootstrap draws are alsotuning parameters commonly used to test moment inequalities.
2
propose a simple test for general moment inequalities that requires no simulation. It simply
uses the (quasi-) likelihood ratio statistic (Tn) and a chi-squared critical value, where the
data-dependent degrees of freedom come as a by-product of computing Tn. We call it a
conditional chi-squared test. By not requiring simulation, the test saves computation time
hundreds-fold compared to tests involving simulated critical values, where a statistic needs
to be computed for each simulated sample. For example, in the simulation experiment re-
ported in Section 5.1, our test is about 200-400 times faster than the recommended testing
procedures in Andrews and Barwick (2012) (AB, hereafter) and Romano et al. (2014) (RSW,
hereafter).
Second, we then consider a conditional moment inequality model where the parameter
vector can be partitioned into two subvectors: (θ′, δ′)′. The subvector θ is the parameter of
interest, while δ is the subvector that the researcher is not interested in, commonly referred
to as the nuisance parameter. We specialize to the setting where δ enters the moment
inequalities linearly and propose a version of the conditional chi-squared test for θ. The
subvector test is based on eliminating the nuisance parameters from a system of inequalities.
By eliminating the nuisance parameters, one only needs to consider a grid on the space
of θ, which can be much lower dimensional than the space of (θ′, δ′)′. Thus, the number of
parameter values that need to be tested is drastically reduced. For example, in the simulation
experiment reported in Section 5.2 below, our subvector test uses only 10 seconds to compute
a confidence interval in a specification with a 4-dimensional δ and 32 moment inequalities.
In both contexts, the conditional chi-squared test is simulation and tuning parameter
free. Its critical value is simply the chi-squared critical value with degrees of freedom equal
to the rank of the active moment inequalities, where we call a moment inequality active if
it holds with equality at the restricted estimator of the moments.4 In a normal model with
known variance, the test is shown to have exact size in finite sample. That is, its worst case
rejection probability under the null hypothesis is equal to its nominal significance level. In
an asymptotically normal model, it is shown to be uniformly asymptotically valid. Moreover,
it automatically adapts to the slackness of the moment inequalities despite the absence of a
deliberate moment selection step. In particular, when all but one inequality get increasingly
slack, the test asymptotes to one that ignores all the slack inequalities, which coincides with
the uniformly most powerful test for the limiting model.
The idea of simple chi-squared critical values for testing inequalities appeared as early
as in Bartholomew (1961) and Rogers (1986) for testing one-sided alternatives against a
4Active inequalities are the sample counterpart of binding inequalities, which hold with equality at thepopulation expectation of the moments. An inequality that is not active is referred to as inactive. Aninequality that is not binding is referred to as slack.
3
simple null, but was only recently proved to be valid for a composite null in Mohamad et al.
(2020) in a normal model. We extend Mohamad et al. (2020) in four ways: (a) we allow
an intercept in the inequalities defining the null hypothesis and thus generalize the null
hypothesis from a cone to a polyhedron. This is important for moment inequality models
as, in the limit, the null hypothesis may not be a cone when some inequalities are close to
binding; (b) we design a simple but novel refinement to make the test size-exact; (c) we
prove the test is uniformly asymptotically valid in moment inequality models; and (d) we
show how to feasibly extend the test to the subvector inference context in the presence of
nuisance parameters that enter the moments linearly. Extensions (a)-(c) rely on technical
contributions described in the appendix. We highlight them briefly here, as they may be
useful in other contexts. The finite sample validity of the refinement relies on a careful
partition of the state space (see Lemmas 1 and 2) combined with an inequality on the tail
of the truncated normal distribution (see Lemma 4). The uniform asymptotic validity relies
on a lemma guaranteeing convergence of an arbitrary sequence of polyhedra to a limiting
polyhedron along a subsequence (see Lemma 7).
The idea of eliminating nuisance parameters from linear moment inequalities is first sug-
gested in Guggenberger et al. (2008), where they introduce Fourier-Motzkin elimination, a
classical algorithm for eliminating nuisance parameters from linear inequalities, to the liter-
ature and propose a Wald-type test on the resulting inequalities. Yet two main difficulties
hinder the application of this idea: (a) numerical calculation of the Fourier-Motzkin elimi-
nation in general is an NP-hard computational problem, and (b) the estimated coefficients
in front of the nuisance parameters enter the resulting inequalities via a non-differentiable
function, and could undermine the validity of testing procedures applied directly to them.
The first difficulty is circumvented because the conditional chi-squared test only relies on the
rank of the active inequalities, and results from the convex analysis literature (see Lemmas
12 and 13) allow us to compute the rank of the active inequalities without carrying out
Fourier-Motzkin elimination. The second difficulty is circumvented by considering models
where the moment inequalities hold conditional on a vector of instrumental variables, a class
of models first proposed by Andrews et al. (2019).
Andrews et al. (2019) (hereafter ARP) has the closest setting with our paper. They
propose a test based on the largest standardized sample moment. In the most basic version,
their test uses a conditional critical value from a truncated normal distribution. This basic
version involves no simulation or tuning parameter and as a result is easy to compute.
However, the basic version has poor power properties that prompt them to recommend a
hybrid test. The hybrid test uses a simulated critical value as well as a tuning parameter
that determines the size of a first-stage least favorable test.
4
There are a few papers in the literature that propose methods to mitigate the compu-
tational challenges described above. Kaido et al. (2019) cast the problem of finding the
bounds of the projection confidence interval of each parameter into a nonlinear nonconvex
constrained optimization problem, and provide a novel algorithm to solve this optimiza-
tion problem more efficiently. Our simple inequality test is complementary to Kaido et al.
(2019)’s algorithm in that we make testing for each value hundreds-fold easier while their
algorithm reduces the number of values that need to be tested. Bugni et al. (2017) propose
a profiling method that simplifies computation in the same way as the subvector confidence
set proposed in this paper, by reducing the search from the space of the whole parameter
vector to that of a low dimensional subvector. The difference is that our subvector test, by
taking advantage of the linearity of the model, is much easier to compute than Bugni et al.
(2017)’s test, which applies more generally. Chen et al. (2018) propose a quasi-Bayesian
method that can also be applied to subvector inference in moment inequality models, as well
as a simple method that applies to scalar parameters of interest.
A couple of other papers aim to reduce the sensitivity of testing to tuning parameters. AB
refines the procedure of Andrews and Soares (2010) (AS, hereafter) by computing an optimal
moment selection threshold that maximizes a weighted average power and a size correction.
Using the optimal threshold and the size correction provided in that paper, one no longer
needs to choose a tuning parameter. Computationally, it is the same as AS if one has 10
or fewer moment inequalities and can use the tables of optimal tuning and size correction
values in the paper. It is much more computationally demanding otherwise. RSW replace
the moment selection step of the previous literature with a confidence set for the slackness
parameter and employ a Bonferroni correction to take into account the error rate of this
confidence set. There is still a tuning parameter, the confidence level of the first step, but
this tuning parameter no longer affects the asymptotic size of the test. Computationally,
using the same number of bootstrap draws, it is slightly more costly than AS due to the first-
step confidence set construction. The recommended tests in AB and RSW are our points
of comparison in the simulation experiments in Section 5.1, where we show that our simple
test saves computational cost hundreds-fold, while having competitive size and power.
The remainder of this paper proceeds as follows. Section 2 describes our setup and several
examples. Section 3 describes how to implement the full-vector and subvector conditional
chi-squared tests. Section 4 states theoretical properties that the tests have. Section 5
reports the simulation results. Section 6 concludes. An appendix contains the proofs and
additional results.
5
2 Setup and Examples
This section describes the setup for full-vector and subvector moment inequality testing,
together with several examples.
2.1 Moment Inequality Model: Full-Vector Inference
Consider a dm-dimensional moment function, m(Wi, θ), that depends on a vector parameter
of interest, θ. Let Θ denote the parameter space for θ, and denote the data by Wini=1 with
joint distribution F . We assume the moments satisfy a vector of linear inequalities given by
AEFmn(θ) ≤ b, (1)
where A is a dA × dm matrix, b is a dA × 1 vector, and mn(θ) = n−1∑n
i=1m(Wi, θ). The
moment inequalities identify the true parameter value up to the identified set,5
Θ0(F ) = θ ∈ Θ : AEFmn(θ) ≤ b. (2)
The specification of a moment inequality model given by (1) is very general. Other papers
in the moment inequality literature, such as AS, specify moment inequalities of the form
EFm1(Wi, θ) ≤ 0 and EFm2(Wi, θ) = 0, (3)
where m1(Wi, θ) denotes a dm1-vector of moments that satisfy inequalities and m2(Wi, θ)
denotes a dm2-vector of moments that satisfy equalities. By including a coefficient matrix A
and an intercept b, (1) covers the specification in (3) with b = 0 and
A =
Idm10dm1×dm2
0dm2×dm1−Idm2
0dm2×dm1Idm2
, (4)
where dA = dm1 + 2dm2 . Introducing A and b is convenient because it allows us to succinctly
cover both equalities and inequalities. It also readily accommodates models with upper and
lower bounds with a deterministic gap in between.6 Below we assume the variance-covariance
matrix of the moments is invertible. Introducing A and b is useful because it specifies the
5The quantities A and b may depend on θ and the sample size n, a dependence that we keep implicitfor simplicity unless otherwise needed. If the dependence is made explicit, the formula for Θ0(F ) becomesθ ∈ Θ : A(θ)EFmn(θ) ≤ b(θ).
6For example, E[Wn] − 1 ≤ θ ≤ E[Wn] can be written in our notation with m(w, θ) = θ − w, A =(
1−1
)and b = ( 0
1 ).
6
inequalities as a linear combination of a “core” set of moments, and only the core set of
moments needs to have an invertible variance-covariance matrix.
Moment inequalities have become widely used in practice as the reference list given in
the first paragraph of the introduction shows. We mention two recent examples here.
Example 1. He (2017) uses a moment inequality model to estimate preferences of applicants
in a school admission problem under a matching mechanism called the Boston mechanism.
To fix ideas, consider a simple case with 3 schools, a, b, and c. Each applicant i submits
a rank-ordered list (r1i , r
2i , r
3i ) to the mechanism. The Boston mechanism first assigns as
many applicants as possible to the top-ranked school in their list while respecting the capacity
constraints of the schools. The unassigned applicants are considered by their second-ranked
school for the remaining school seats, if any. The process continues until all seats are filled or
all students assigned. The Boston mechanism is not strategy-proof in that applicants, instead
of submitting their true preference ranking, can benefit from an untruthful rank-ordered list.
He (2017) aims to answer an important policy question: does switching to a strategy-
proof mechanism make the less sophisticated applicants better off? The question necessitates
He to allow in his model less sophisticated applicants who do not form correct beliefs about
admission probabilities. He allows them to form individualized beliefs which become incidental
parameters that preclude full identification. However, He shows that the model can uniquely
predict the probability of some rank-ordered lists and bound that of other rank-ordered lists
using functions that do not involve beliefs. For example,
Then, analogously to the previous example, we have
E
[(ψLi (θ0)I(Zi)
−ψUi (θ0)I(Zi)
)−
(I(Zi)Z
′ci
−I(Zi)Z′ci
)δ0
∣∣∣∣∣Zi]≤ 0, (13)
where I(Zi) is a finite non-negative vector of instrumental functions of Zi = (Z ′ci, Z′ei)′. This
yields a model of the form (8), where BZ = I, Wi contains Zi as well as the variables used to
construct ψLi and ψUi , m(Wi, θ) =(
ψLi (θ)I(Zi)
−ψUi (θ)I(Zi)
), CZ = n−1
∑ni=1
(I(Zi)Z
′ci
−I(Zi)Z′ci
), and dZ = 0.
In Section 5.2, we consider a Monte Carlo example of a special case of this model where
we also provide more details on the bound construction. In the application of Gandhi et al.
(2019), control variables (Zci) are essential for the validity of the instruments.
Example 5. Eizenberg (2014) studies the portable PC market to quantify the welfare effect
of eliminating a product. Central to the question is the fixed cost of providing the product.
Eizenberg uses the revealed preference approach to construct bounds, Li and Ui, for the fixed
cost of product i. Let Zi be a vector of product characteristics (including the constant). One
can consider the following conditional moment inequality model:
E [(Li − P (Zi)′γ0)I(Zi)|Zi] ≤ 0 (14)
E [(−Ui + P (Zi)′γ0)I(Zi)|Zi] ≤ 0,
where P (Zi) is a vector of known functions of Zi and I(Zi) is a vector of nonnegative
instrumental functions. The function P (Zi)′γ0 captures the (observed) heterogeneity of fixed
costs across products. Using our method, one can construct confidence intervals for each
10
element of γ0 and any linear combinations of γ0 such as the average derivative.
Suppose the parameter of interest is the average derivative with respect to the first element
of Zi: θ0 = γ′0P1,n, where P1,n = n−1∑n
i=1 ∂P (Zi)/∂z1. One can rewrite (14) as
E[(Li − θ0)I(Zi)− I(Zi)(P (Zi)− P1,n)′γ0|Zi
]≤ 0
E[(−Ui + θ0)I(Zi) + I(Zi)(P (Zi)− P1,n)′γ0|Zi
]≤ 0, (15)
which falls into the framework of (8) where BZ = I, Wi contains Zi as well as the variables
used to construct Li and Ui, m(Wi, θ) =(
(Li−θ)I(Zi)−(Ui−θ)I(Zi)
), CZ = n−1
∑ni=1
(I(Zi)(P (Zi)−P1,n)
−I(Zi)(P (Zi)−P1,n)
),
and dZ = 0.
Two additional examples that fit into our subvector framework are Katz (2007) and
Wollman (2018) as reviewed in ARP.
3 Conditional Chi-Squared Tests: Implementation
In this section we define a new family of tests, called conditional chi-squared tests, for the
inequalities specified in (1) and (8). They are called conditional chi-squared tests because
they use a critical value that is a quantile of the chi-squared distribution, where the degree
of freedom depends on the active inequalities. We give instructions for implementing the
tests, which shows that they are easy to implement and have low computational cost.
3.1 Full-Vector Tests
We use the inequalities specified in (1) to test hypotheses on θ. Like most papers in the
literature, including AS, AB, and RSW, we conduct inference for the true parameter θ0 by
test inversion. That is, for a given significance level α ∈ (0, 1), one constructs a test φn(θ, α)
for H0 : θ = θ0, where φn(θ, α) = 1 indicates rejection and φn(θ, α) = 0 indicates a failure to
reject. One then obtains the confidence set for θ0 by calculating
CSn(1− α) = θ ∈ Θ : φn(θ, α) = 0. (16)
In practice, CSn(1− α) is calculated by testing H0 : θ = θ0 on a grid of values of θ ∈ Θ.
We introduce two new tests, one being a refinement of the other. Both are easy to
compute, requiring no tuning parameters or simulations. Both use the (quasi-) likelihood
ratio statistic,
Tn(θ) = minµ:Aµ≤b
n(mn(θ)− µ)′Σn(θ)−1(mn(θ)− µ), (17)
11
where Σn(θ) denotes an estimator of VarF (√nmn(θ)), the variance-covariance matrix of the
standardized moments. When Wini=1 is i.i.d., we can take
Σn(θ) =1
n
n∑i=1
(m(Wi, θ)−mn(θ))(m(Wi, θ)−mn(θ))′. (18)
When Wini=1 is not i.i.d., we can define Σn(θ) to account for the clustering or autocorrela-
tion in Wini=1.
Both tests use data-dependent critical values that are based on the rank of the rows of A
corresponding to the inequalities that are active in finite sample. To define them rigorously,
let µ be the solution to the minimization problem in (17). This is the restricted estimator
for the moments. It can be calculated using a quadratic programming algorithm. Let a′j
denote the jth row of A and let bj denote the jth element of b for j = 1, 2, . . . , dA. Let
J = j ∈ 1, 2, . . . , dA : a′jµ = bj, (19)
which is the set of indices for the active inequalities. For a set J ⊆ 1, 2, . . . , dA, let AJ be
the submatrix of A formed by the rows of A corresponding to the elements in J . Let rk(AJ)
denote the rank of AJ , and let r = rk(AJ). Note that for test inversion, µ, J , and r need to
be recalculated for every value of θ.
The critical value of the first simple test is the 100(1−α)% quantile of χ2r, the chi-squared
distribution with r degrees of freedom, denoted by χ2r,1−α. We denote the first simple test
by
φCCn (θ, α) = 1
Tn(θ) > χ2
r,1−α, (20)
where CC stands for “conditional chi-squared” indicating that the test uses the chi-squared
critical value conditional on the active inequalities.7 We show the validity of the CC test
below. The intuition is that Tn(θ) (asymptotically) follows the χ2r distribution conditional on
r when all inequalities are binding (that is, AEFmn(θ) = b), and is stochastically dominated
by the χ2r distribution when some of the inequalities are slack.
The CC test does not reject when r = 0, and rejects with probability at most α when
r > 0. Thus, an upper bound on its (asymptotic) null rejection probability is (1 − Pr(r =
0))α. This shows that the CC test can be somewhat conservative.
7The conditional aspect of our critical value gives it an apparent resemblance with the critical value ofthe conditional test in ARP. However, the resemblance is only superficial. Like any conditional test, what isimportant is the statistic that is conditioned on. That statistic is the set of active inequalities in our case,while it is the second largest standardized sample moment in ARP’s case.
12
We propose a second simple test that eliminates the conservativeness. We call this
the RCC (refined CC) test. We define the RCC test by adjusting the quantile of the χ21
distribution when r = 1. Instead of the 100(1−α)% quantile, the RCC test uses a 100(1−β)%
quantile, where β varies between α and 2α depending on how far from active the additional
(inactive) inequalities are. We now construct β carefully so that the refinement exactly
restores the size of the test.
When r = 1, suppose without loss of generality that the first inequality is active and
satisfies a1 6= 0.8 Next, for each j = 2, ..., dA, let
τj =
√n‖a1‖Σn(θ)(bj−a′j µ)
‖a1‖Σn(θ)‖aj‖Σn(θ)
−a′1Σn(θ)ajif ‖a1‖Σn(θ)‖aj‖Σn(θ) 6= a′1Σn(θ)aj
∞ otherwise, (21)
where ‖a‖Σ= (a′Σa)1/2. This τj is a normalized measure of the inactivity of the jth inequality.
It is essentially bj − a′jµ normalized using the ratio of the Euclidean norms of Σ1/2n (θ)a1 and
Σ1/2n (θ)aj and the angle between the two.9 Then let
τ = infj∈2,...,dA
τj. (22)
This is a measure of the minimum inactivity of the inactive inequalities. This quantity is
easy to compute and has a nice geometric interpretation that is illustrated in Illustration 1
below.
Now we can define
β =
2αΦ(τ) if r = 1
α otherwise,(23)
where Φ(·) is the standard normal cumulative distribution function (cdf). When a second
inequality is close to being active, τ is close to 0 and then β is close to α. When all the other
inequalities are far from active, then τ is very large and β is close to 2α. We define the RCC
test for H0 : θ = θ0 to be
φRCCn (θ, α) = 1Tn(θ) > χ2
r,1−β. (24)
Note that for test inversion, both r and β need to be recalculated for every value of θ since
they may depend on θ via A, b, and Σn(θ).
8In this case, other inequalities may be active too since we do not rule out the possibility that A containsredundant or zero rows. But this is possible only if the other active inequalities are collinear with a1.
9Note that a′1Σn(θ)aj = ‖a1‖Σn(θ)‖aj‖Σn(θ)cos γ, where γ stands for the angle.
13
Since τ ∈ [0,∞], β ∈ [α, 2α]. Thus we have the following comparison of the CC and the
RCC tests:
φRCCn (θ, α/2) ≤ φCC
n (θ, α) ≤ φRCCn (θ, α). (25)
Moreover, when an equality is being tested, at least two inequalities are always active, in
which case we have β = α, and the RCC test reduces to the CC test.
It helps to illustrate the CC and RCC tests in a simple two-inequality example.
Illustration 1. Consider an example where dm = 2, A = I, b = 0, and Σn(θ) = I. We
omit θ from the notation for ease of exposition. Thus, we are testing H0 : EFmn ≤ 0 using
the statistic√nmn, which asymptotically follows a bivariate standard normal distribution.
On the space of√nmn, the rejection region for the CC test is illustrated by the shaded
region in Figure 1. In this example, the likelihood ratio statistic is the squared distance
between√nmn and the third quadrant of the plane. If
√nmn lies in the second or fourth
quadrants of the plane, one inequality is active and the χ21 quantile is used. If
√nmn lies in
the first quadrant of the plane, two inequalities are active and the χ22 quantile is used. The
critical values for the RCC test are illustrated using a dashed line where they deviate from
the CC test.10
From the figure, we can see that the RCC test deviates from the CC test only when the
number of active inequalities is one (in the second and fourth quadrants of the plane). In
that case, a smaller critical value is used that depends on how far from active the other in-
equality is, measured using τ . The quantity τ has the following geometric interpretation: the
point√nµ is the projection of
√nmn onto a face of the polyhedron defined by the inequali-
ties. Continue that line into the interior of the polyhedron until you reach a point, y, that is
equidistant between two inequalities. In the figure, the set of points that are equidistant be-
tween two inequalities is represented by the dotted line, which is the 45-degree line. Then τ is
the distance between√nµ and y. This geometric interpretation extends to more complicated
examples with more inequalities or non-orthogonal inequalities.
The reason the refinement still controls size is that we condition on the event that√nmn
belongs to the ray that starts at y and emanates through√nµ and
√nmn. It is sufficient to
control the conditional rejection probability for every such ray. By conditioning on the ray,
the denominator of the conditional rejection probability is Φ(τ), which allows us to adjust α
up to β.
10The discontinuity in the critical value illustrated in Figure 1 is similar to the discontinuity in the rec-ommended generalized moment selection function (their ϕ(1)) in AB that occurs whenever a moment is atthe threshold of being selected.
14
√nmn
√nµ
yτ
√nmn1
√nmn2
Figure 1: Geometric representation of the CC test (shaded) and the RCC test (dashed) inIllustration 1.
It is also helpful to see the CC tests in a simple model with a scalar parameter of
interest, one upper bound, and one lower bound: E[Y L] ≤ θ ≤ E[Y U ]. This setup has
been considered, for example, in Stoye (2009). For simplicity, suppose Y L and Y U are
independent and have unit variance. Let Y Ln and Y U
n be the sample average of Y L and Y U ,
respectively. Then it is not difficult to find that (when ∆n :=√n(Y U
n − Y Ln ) > −z1−α/2) the
100(1−α)% CC confidence interval is [Y Ln − z1−α/2/
√n, Y U
n + z1−α/2/√n], and also that the
RCC confidence interval is the set of θ values that satisfy Y Ln − z1−αΦ(
√n(Y Un −θ)∨0)/
√n ≤ θ ≤
Y Un +z1−αΦ(
√n(θ−Y Ln )∨0)/
√n, where ∨ is the maximum operator. Solving numerically, we find
that the RCC confidence interval is [Y Ln − cα/
√n, Y U
n + cα/√n] where cα depends on ∆n.
For example, when α = 0.05, cα declines smoothly from 1.96 to 1.67 and then to 1.65 as ∆n
varies from −1.96 to 0 and then to 1. Thus, the refinement brings about a big improvement
in this simple setup.
To end this subsection, Algorithm 1 presents pseudo-code that can be used to compute
the CC and RCC tests. The pseudo-code is implemented in user-friendly Matlab code
provided in the replication files. The implementation requires a tolerance (tol) to account
for numerical imprecision in the quadratic programming used to compute Tn(θ). We use
10−8 in the Monte Carlo simulations.
Remark. Algorithm 1 makes clear some of the convenient features of the implementation
of the CC tests. We list them here for emphasis. (a) The CC tests do not require any tuning
parameters or simulations to implement. (b) The CC tests are simple to code. (c) There
15
Algorithm 1: Pseudo-code for implementing the CC and RCC tests.
1: %Compute the CC Test2: Tn(θ), µ ← minµ:Aµ≤b n(mn(θ)− µ)′Σn(θ)−1(mn(θ)− µ)
3: J := j = 1, . . . , dA : a′jµ = bj4: AJ ← J , A5: r := rk(AJ)6: φCC
n (θ, α) := 1Tn(θ) > maxχ2r,1−α, tol.
7:
8: %Compute the RCC Test9: Implement lines 2-5, and then
10: if r = 1 and χ21,1−2α ≤ Tn(θ) ≤ χ2
1,1−α then11: (suppose a′1µ = b1 and ‖a1‖6= 0)12: for j = 2, . . . , dA do
is also a third convenient feature of the implementation that is less clear from Algorithm
1, which is that the inequalities do not need to be “reduced” before implementing the test.
Often in practice a collection of inequalities contains redundant inequalities, or inequalities
that are implied by the other inequalities. The CC tests are invariant to the inclusion of
redundant inequalities. In contrast, other tests for moment inequalities, including AS, AB,
and RSW, are not invariant, and thus are improved by reducing the collection of inequalities
by removing the redundant ones before implementing the tests.
3.2 Subvector Tests
Next we use the inequalities in (8) to test hypotheses on θ. For a given value, θ0, testing
H0 : θ = θ0 amounts to testing the following hypothesis:
H0 : ∃δ such that BZEFZ [mn(θ0)|Z]− CZδ ≤ dZ , a.s. (26)
In this subsection, we define subvector versions of the conditional chi-squared tests for (26).
16
Directly testing (26) is difficult because it requires checking the validity of the inequality
for all values of δ. We construct our test using an equivalent form of (26) that eliminates δ:
H0 : AZEFZ [mn(θ0)|Z] ≤ bZ , (27)
for some matrix AZ and vector bZ that are deterministic functions of CZ , BZ , and dZ .
The existence of such a transformation is well-known in the theory of linear inequalities,
dating back to Fourier (1826). It has been noted in the moment inequality literature by
Guggenberger et al. (2008), but has not been used in practice to the best of our knowledge.
One significant obstacle is that calculating AZ and bZ is computationally difficult except
in small dimensions. The key innovation in our approach is to conduct the conditional
chi-squared test on (27) without calculating AZ and bZ , as we describe next.
The subvector CC (sCC) test for (26) is the full-vector CC test based on (27). It uses
the test statistic
Tn(θ) = minµ:AZµ≤bZ
n(mn(θ)− µ)′Σn(θ)−1(mn(θ)− µ), (28)
where Σn(θ) is an estimator of the conditional variance: Σn(θ) = Var(√nmn(θ)|Z), discussed
in more detail below. The critical value of the sCC test is χ2r,1−α, where r is the rank of the
active inequalities, defined as in the full-vector CC test applied to the problem in (28).
The first step to computing Tn(θ) without computing AZ and bZ is to recognize that
Tn(θ) = minδ,µ:BZµ−CZδ≤dZ
n(mn(θ)− µ)′Σn(θ)−1(mn(θ)− µ). (29)
One can calculate Tn(θ) without knowing AZ or bZ by quadratic programming, where (δ′, µ′)′
is the decision variable. Let (δ′, µ′)′ be the solution to the minimization problem.
Before we describe how to compute r without AZ or bZ , we briefly describe what AZ and
bZ are. There are multiple ways to define AZ and bZ for (27) to be equivalent to (26). The
Fourier-Motzkin algorithm noted in Guggenberger et al. (2008) is one of them. Another that
is particularly convenient for our purpose is to take convex combinations of the inequalities.
If we let h ∈ Rk denote a vector of nonnegative weights that sum to one, then the convex
combination of the inequalities in (26) is given by
h′BZEFZ [mn(θ0)|Z]− h′CZδ ≤ h′dZ . (30)
When h′CZ = 0, the δ parameter is eliminated from the inequalities. It follows from Gale’s
17
Theorem11 that it is sufficient to consider the set of all inequalities (30) indexed by
h ∈ H := h ∈ Rk : h ≥ 0, C ′Zh = 0, 1′h = 1. (31)
To connect this result to (27), note that H defines a convex polyhedron in Rk. Every element
of a convex polyhedron is a convex combination of its extreme points, or vertices. Thus, it
is sufficient to consider the vertices of H. That is, a particular value θ0 satisfies (26) if and
only if θ0 satisfies (30) for all h that are vertices of H. Equivalently, if we take H(CZ) to
denote a matrix where each row is a vertex of H, then defining
AZ = H(CZ)BZ and bZ = H(CZ)dZ (32)
renders (27) equivalent to (26). This result is formally stated in Lemma 12 in Appendix C.1.
Thus, to calculate AZ and bZ , we could enumerate the vertices of H. While vertex
enumeration seems simple, it can be computationally challenging when k and/or p are large.
(Experience suggests even moderate values of k and p can lead to computational challenges.)
As noted in various textbooks, including Sierksma and Zwols (2015), there is no polynomial
time algorithm for vertex enumeration available in general. We proceed to describe how to
compute r without AZ or bZ .
To compute r, we define the active inequalities. For any h ∈ H, we say that the inequality
in (30) is active if h′BZ µ = h′dZ , where µ is calculated from (29). Accordingly, let
H0 = h ∈ H : (BZ µ− dZ)′h = 0 (33)
denote the subset of H that characterizes the active inequalities. In fact, H0 is always a
face of H due to the definition of µ. By the definition of r and AZ , r is the maximum
number of linearly independent vectors of the form B′Zh, where h is a vertex of H0. The
key is to recognize that we do not need to enumerate the vertices of H0 to calculate r.
Instead, we only have to calculate the maximum number of linearly independent vectors
in B′ZH0 = B′Zh : h ∈ H0. Notationally, we call the maximum number of linearly
independent vectors in B′ZH0 the “rank of B′ZH0” and denote it by rk(B′ZH0).12 The fact
that r = rk(B′ZH0) is stated formally in Lemma 13 in Appendix C.1.
Therefore, to compute r one only needs to find rk(B′ZH0). It turns out that calculating
the rank of a polyhedron is much faster computationally than enumerating the vertices.
11See Theorem 2.7 in Gale (1960). Gale’s Theorem is considered by some authors (e.g. Bachem and Kern(1992), Theorem 4.1) to be a variant of Farkas’ Lemma, a result that may be familiar to readers who haveworked on nonnegative solutions to linear systems of equations.
12Usually, rk(·) is defined for matrices. Here we extend the definition to arbitrary sets of vectors.
18
Here, we present an algorithm based on solving k + 1 linear programming (LP) problems.
For exposition, we assume rk(BZ) = k, which is true in Examples 3-5, so that the rank of
BZH0 is equal to the rank of H0.13 Calculating the rank of H0 is equivalent to finding the
dimension of the smallest linear subspace containing H0, denoted by span(H0). Note that
H0 is defined by linear equalities and inequalities, where the inequalities are given by h ≥ 0.
Some of these inequalities may have to hold with equality due to the other equations in
the definition of H0. That is, for some j = 1, ..., k, h ∈ H0 may imply that hj = 0. If we
can figure out which of the inequalities have to hold with equality, we can find a system of
equations that defines span(H0), and from there figure out the dimension of span(H0).
Thus, the imminent question becomes: for which j does h ∈ H0 imply hj = 0? For each
j = 1, ..., k, we can answer this question with a LP problem.14 For each j = 1, ..., k, calculate
ζj = minh−hj s.t. h ≥ 0, C ′Zh = 0, (BZ µ− dZ)′h = 0, 1′h = 1. (34)
If ζj = 0, then there does not exist an h ∈ H0 with hj > 0, which means that the jth
inequality has to hold with equality. Let J0 be the collection of all j’s such that ζj = 0. Also
let IJ0 denote the rows of the k-dimensional identity matrix corresponding to indices in J0.
It follows that
span(H0) = h ∈ Rk : IJ0h = 0, C ′Zh = 0, (BZ µ− dZ)′h = 0. (35)
Correspondingly, the rank ofH0 is k minus the rank of the coefficients on the linear equations
defining span(H0):
rk(H0) = k − rk
(IJ0
C′Z(BZ µ−dZ)′
). (36)
This is how we compute r, and hence the sCC test, without computing AZ or bZ .
While implementing the sCC test does not require computing AZ or bZ , this is not the case
for the subvector RCC (sRCC) test. The refinement requires knowing AZ and bZ . However,
note that the refinement makes a difference only when r = 1 and Tn(θ) ∈ [χ21,1−2α, χ
21,1−α]
(because β ∈ [α, 2α]). Thus, to implement the sRCC test, we recommend computing r and
Tn(θ) first using the method outlined above, and only computing AZ , bZ , and the refinement
when r = 1 and Tn(θ) ∈ [χ21,1−2α, χ
21,1−α]. Our experience is that this event is rare when
13Appendix C.2 presents an algorithm for the case that rk(BZ) < k.14Before implementing these LP problems, one should first determine if H0 is empty. This can be done
by solving the LP problem: f := minh−(BZ µ− dZ)′h s.t. h ≥ 0, C ′Zh = 0, 1′h = 1. If f > 0, that indicatesthat all elements of AZ µ − bZ are negative and there is no active inequality. In this case, set H0 = ∅ andr = 0.
19
k and p are large enough to make computing AZ and bZ challenging.15 When k and p are
small, computing AZ and bZ via vertex enumeration is feasible.
Next we give two examples of the conditional variance estimator Σn(θ). The conditional
variance is the appropriate variance matrix to be estimated because the inequalities hold
conditionally on Z and the theoretical properties of the tests are derived using the conditional
distribution of mn(θ0) given Z. We describe two conditional variance matrix estimators, one
for discrete Zi and the other for continuous Zi, both in the context of i.i.d. data.
In the first case, Zi takes on a finite number of values in a set, Z. A straightforward
estimator of Var(√nmn(θ)|Z) is the weighted average of the sample variances of m(Wi, θ)
within each category of Zi:
Σn(θ) =∑`∈Z
n`n
1
n` − 1
n∑i=1
(m(Wi, θ)−m`n(θ))(m(Wi, θ)−m`
n(θ))′1Zi = `, (37)
where n` =∑n
i=1 1Zi = ` and m`n(θ) = 1
n`
∑ni=1m(Wi, θ)1Zi = `. As we show in
Appendix D.2, sufficient conditions for the consistency of this estimator involve boundedness
of the fourth moment of m(Wi, θ) and the assumption that every Zi value occurs twice or
more in the sample Zini=1 eventually. This is the estimator used in our Monte Carlo
simulations in Section 5.2.
In the second case, Zi contains continuous random variables. One can use a nearest
neighbor matching estimator similar to that used for the standard error of a regression
discontinuity estimator in Abadie, Imbens, and Zheng (2014).16 Let ΣZ,n = n−1∑n
i=1(Zi −Zn)(Zi − Zn)′ where Zn = n−1
∑ni=1 Zi. For each i, define the nearest neighbor to be
As we show in Appendix D.2, sufficient conditions for the consistency of this matching estima-
tor involves the boundedness of Zi∞i=1 and the Lipschitz continuity of Var(m(Wi, θ)|Zi = zi)
15Theoretically, if the moment inequalities are uncorrelated and k of them are binding, then the probabilityr = 1 is asymptotically k2−k. This is an upper bound for Pr(r = 1, Tn(θ) ∈ [χ2
1,1−2α, χ21,1−α]). In finite
sample, the number of near-binding moment inequalities also reduces this probability.16This is also the estimator used in ARP.
20
Algorithm 2: Pseudo-code for the sCC and sRCC tests when rk(BZ) = k.
1: %Compute the sCC Test2: Tn(θ), µ← minδ,µ:BZµ−CZδ≤dZ n(mn(θ)− µ)′Σn(θ)−1(mn(θ)− µ)3: f := minh∈H−(BZ µ− dZ)′h4: if f > tol then5: r := 06: else7: for j = 1, . . . , k do8: hmj ← CZ , BZ , dZ , µ by (34).9: end for
10: J0 := j = 1, . . . , k : hmj = 011: IJ0 ← J0
12: r := k − rk
(IJ0
C′Z(BZ µ−dZ)′
)13: end if14: φsCC
n (θ, α) := 1Tn(θ) > maxχ2r,1−α, tol.
15:
16: %Compute the sRCC Test17: Implement lines 2-13, and then18: if r = 1 and Tn(θ) ∈ [χ2
1,1−2α, χ21,1−α] then
19: H(CZ)← H using a vertex enumeration algorithm, e.g. con2vert.m in Matlab(ref. Kleder (2020))
20: AZ , bZ ← H(CZ)BZ , H(CZ)dZ21: Suppose a′1µ = b1 and ‖a1‖6= 0. %Ignore the subscript Z for notational ease.22: for j = 2, . . . , dA do
(c) |CorrF (m(Wi, θ))|> ε, where CorrF (m(Wi, θ)) is the correlation matrix of the random
vector m(Wi, θ) under F .
(d) EF |mj(Wi, θ)/σF,j(θ)|2+ε≤M for j = 1, . . . , dm.
24
Remarks. (1) This set of assumptions is commonly made in the moment inequality lit-
erature (see e.g. Andrews and Guggenberger (2009), AS, or Kaido et al. (2019)). Part (a)
assumes i.i.d. for simplicity, but is not essential for the results. One can use our method
on data with cluster, spatial, or temporal dependence, after changing Σn(θ) to a variance
estimator that appropriately accommodates the dependence. In that case, the validity of
our procedure follows from Theorem 3 in Appendix B. Part (b) is innocuous as it simply
requires the moment functions be nonconstant in Wi. Parts (a), (b), and (d) together imply
asymptotic normality of the sample moments via a Lyapunov central limit theorem.
(2) Part (c) requires uniform invertibility of the correlation matrix, which is imposed be-
cause we use the inverse of Σn(θ) in the test statistic. While this rules out perfectly correlated
moments and near-perfectly correlated moments, perfectly correlated moments can be han-
dled in specification (1) by an appropriate choice of A and b provided the perfect correlation
is known. For example, in Example 1, suppose one reaches the moment inequalities:
E[1(r1i , r
2i , r
3i ) = (0, 0, 0) − g000(θ)] ≤ 0
E[−1(r1i , r
2i , r
3i ) = (0, 0, 0)+ g000(θ)] ≤ 0
E[1(r1i , r
2i , r
3i ) = (a, b, c) − gabc(θ)] ≤ 0
...
E[1(r1i , r
2i , r
3i ) = (b, 0, 0) − gb00(θ)] ≤ 0
E[1(r1i , r
2i , r
3i ) = (c, 0, 0) − gc00(θ)] ≤ 0. (41)
These moment inequalities are collinear both because the first is the negative of the second
and because the probabilities of all rank-order lists add up to 1. The invertibility requirement
can still be satisfied by defining
m(Wi, θ) =
1(r1
i , r2i , r
3i ) = (0, 0, 0)
1(r1i , r
2i , r
3i ) = (a, b, c)...
1(r1i , r
2i , r
3i ) = (b, 0, 0)
, A =
1 0 ... 0−1 0 ... 00 1 ... 0...
......
0 0 ... 1−1 −1 ... −1
, and b =
g000(θ)
−g000(θ)
gabc(θ)...
gb00(θ)
gc00(θ)− 1
.
Note that m(Wi, θ) is the core set of moments, one for each possible rank-order list, omitting
the last one. This is similar to dealing with perfect multicollinearity in a linear regression
with binary variables.
Let DF (θ) denote the diagonal matrix formed by σ2F,j(θ) : j = 1, . . . , dm. For J ⊆
25
1, . . . , dA, let IJ denote the rows of the identity matrix corresponding to the indices in J .18
The following theorem states the asymptotic properties of the RCC test.
Theorem 2. Suppose Assumption 1 holds.
(a) limsupn→∞
supF∈F supθ∈Θ0(F ) EFφRCCn (θ, α) ≤ α.
For a sequence (Fn, θn) : Fn ∈ F , θn ∈ Θ0(Fn)∞n=1 such that A(θn)DFn(θn) → A∞, for
some matrix A∞ and for all J ⊆ 1, . . . , dA, rk(IJA(θn)DFn(θn)) = rk(IJA∞) for all n,
(b) if A∞ 6= 0 and for all j ∈ 1, ..., dA,√n(a′jEFnmn(θn)− bj)→ 0, then
limn→∞
EFnφRCCn (θn, α) = α, and
(c) if instead there is a J ⊆ 1, . . . , dA such that for all j /∈ J ,√n(a′jEFnmn(θn)− bj)→
−∞ as n→∞, then
limn→∞
PrFn(φRCCn (θn, α) 6= φRCC
n,J (θn, α))
= 0.
Remarks. (1) Part (a) shows that the RCC test is asymptotically uniformly valid. Part
(b) shows that when all the inequalities bind or are sufficiently close to binding, the RCC
test does not under-reject asymptotically, assuming the rank of combinations of rows of A
does not change in the limit. Part (c) shows an asymptotic IDI property of the RCC test: if
some inequalities are very slack, the test reduces to the one based only on the not-very-slack
inequalities.
(2) If θ and F are fixed and A and b do not depend on n, then the condition in part
(c) is satisfied with J equal to the set of all binding inequalities.19 If, in addition, AJ 6= 0,
parts (b) and (c) can be combined to show that the RCC test has exact asymptotic size
(and hence is asymptotically non-conservative). By exact asymptotic size, we mean that
there exists a sequence of θn ∈ Θ0(Fn) such that the limiting rejection probability is equal
to α. (In this case, the sequence is just the fixed sequence with (θn, Fn) = (θ, F ) for all
n.) This is compatible with the possibility that other sequences, θn ∈ Θ0(Fn), have limiting
rejection probability strictly less than α. Indeed, whenever some moment inequalities are
local to binding such that their slackness neither converges to zero nor diverges to infinity,
the limiting rejection probability will be less than α.
18Note that IJA is an alternate notation for AJ .19Technically, since F is the joint distribution of Wini=1, we need the marginal distribution of each Wi
to be fixed.
26
(3) Theorem 2 combines with (25) to imply that the CC test is asymptotically uniformly
valid, and, when the RCC test is asymptotically non-conservative, it can only be conservative
to a limited extent:
α/2 ≤ limsupn→∞
supF∈F
supθ∈Θ0(F )
EFφCCn (θ, α) ≤ α. (42)
(4) The outline of the proof of Theorem 2(a) is conceptually simple. The almost sure
representation theorem is invoked on the convergence of the moments, and then Theorem 1 is
invoked on the limiting experiment. However, the details are quite complicated. A technical
complication that arises is that the rank of the inequalities can be lower in the limit than
in the finite sample. This is handled by adding additional inequalities so the sequence of
polyhedra defined by the inequalities converges to a limiting polyhedron along a subsequence
(see Lemma 7 in the appendix).
4.3 Finite Sample Validity of the sCC and sRCC Tests
The following result states the finite sample properties of the sRCC test assuming normally
distributed moments and a known conditional variance matrix. The result is a corollary of
Theorem 1. Let z be a realization of Z, and let Θ0(Fz) = θ ∈ Θ : ∃δ s.t. BzEFz [mn(θ)|z]−Czδ ≤ dz.20 Let ej denote the RdA-vector with jth element one and all other elements zero.
For any J ⊆ 1, . . . , dA, let φsRCCn,J (θ, α) denote the sRCC test defined using IJAz and IJbz
in place of Az and bz.
Corollary 1. Suppose Σn(θ) is an invertible matrix such that the conditional distribution of√n(mn(θ)− EFZmn(θ)) given Z = z is distributed N(0,Σn(θ)) and Σn(θ) = Σn(θ) a.s. for
all θ ∈ Θ. Then the following hold.
(a) For any θ ∈ Θ0(Fz), EFz [φsRCCn (θ, α)|z] ≤ α.
(b) If AzEFz [mn(θ)|z] = bz and Az 6= 0, then EFz [φsRCCn (θ, α)|z] = α.
(c) If J ⊆ 1, ..., dA and θs∞s=1 ⊆ Θ is a sequence such that for all j /∈ J , e′jAz 6= 0 and
e′j(AzEFzmn(θs)− bz)/‖e′jAz‖→ −∞ as s→∞, where the dependence of Az and bz on
s via θs is implicit, then
lims→∞
PrFz(φsRCCn (θs, α) 6= φsRCC
n,J (θs, α)|z)
= 0.
Remarks. (1) Part (a) shows the finite sample validity of the sRCC test under normality.
Part (b) shows that the sRCC test is size-exact when there is an Fz under which all the
20Technically, z and the objects that are defined given z, including Θ0(Fz), depend on n as well. We keepthis dependence implicit for simplicity.
27
inequalities bind. Part (c) states the IDI property of the sRCC test. Since the sRCC
test rejects whenever the sCC test does, the corollary implies the validity of the sCC test:
EFz [φsCCn (θ, α)|z] ≤ α.
(2) A result on the asymptotic properties of the sRCC test is available in Appendix D. It
relies on the asymptotic normality of the moments conditional on Z1, ..., Zn and a consistent
estimator for Σn(θ).
(3) The condition in part (c) depends on Az = H(Cz)Bz and bz = H(Cz)dz, which are
the inequalities after eliminating the nuisance parameters. It is unclear whether an alter-
native sufficient condition can be formulated that depends only on the original inequalities.
In a model with a scalar parameter of interest, Rambachan and Roth (2020) use a linear
independence constraint qualification to show that the test in ARP reduces to the one-sided
t-test at an endpoint of the identified set. A similar constraint qualification may be helpful in
formulating a sufficient condition for part (c) that depends only on the original inequalities.
(4) The invertibility requirement on Σn(θ) guides the choice of instrumental functions,
I(Zi), in Examples 3-5. In those examples, the instrumental functions are used to increase
the number of moment inequalities in order to sharpen identification. The instrumental
functions in Andrews and Shi (2013) serve the same purpose. Like in Andrews and Shi
(2013), appropriate functions are indicators of cells defined by Zi. However, unlike Andrews
and Shi (2013), we do not recommend using cells of multiple levels of fineness. For example,
when Zi ∈ 0, 12, we do not recommend using both 1Zi = (0, 1)′, 1Zi = (0, 0)′, 1Zi =
(1, 1)′, 1Zi = (1, 0)′ and 1Zi ∈ (0, 1), (0, 0), 1Zi ∈ (1, 0), (1, 1). This is because
the subsequent moments are linearly dependent, causing Σn(θ) to be singular. We thus
recommend choosing a partition of the space of Zi and using the indicator of all cells in
that partition. For example, when Zi ∈ 0, 12, use I(Zi) = (1Zi = (0, 1)′, 1Zi =
(0, 0)′, 1Zi = (1, 1)′, 1Zi = (1, 0)′)′.The need to choose instrumental functions is common in conditional moment inequality
models. A complete cost and benefit analysis is beyond the scope of this paper, but we
can make some general observations. (a) A finer partition yields sharper identification,
meaning that Θ0(Fz) is smaller. (b) A finer partition also means fewer observations per cell,
potentially implying a worse normal approximation. A crude rule of thumb is to ensure that
the smallest cell in the partition contains 15 or more observations.
5 Monte Carlo Simulations
In the previous two sections, we have shown that the CC and RCC tests have a variety
of desirable properties, including convenient implementation features and theoretical results
28
on size and adaptation to slackness. In this section, we use Monte Carlo simulations to
compare the CC and RCC tests to alternative moment inequality tests in terms of size,
power, and computational cost. We consider two sets of Monte Carlo simulations, one to
evaluate the performance of the CC and RCC tests in a general moment inequality model
without nuisance parameters, and the second to evaluate the performance of the sCC and
sRCC tests in Example 4. In these simulations, no test should be expected to dominate any
other in terms of power. Still, we find that the CC and RCC tests are at least competitive
in terms of size and power and dominate in terms of computational cost.
5.1 Full-Vector Simulations
Our first set of simulations takes the generic moment inequality design from AB. This design
allows a variety of correlation structures across moments and thus can approximate a wide
range of applications.
We briefly describe the Monte Carlo design here and refer readers to Section 6 of AB for
further details. Consider the moment inequality model
E[θ −Wi] ≤ 0, (43)
and the null hypothesis H0 : θ = 0, where Wi is a k-dimensional random vector. Let the
data Wini=1 be i.i.d. with sample size n. Let Wi ∼ (µ,Ω), where Ω is a correlation matrix
and µ is a mean-vector. Three choices of Ω are considered: ΩNeg, ΩZero, and ΩPos. For ΩZero,
the moments are uncorrelated. For ΩPos, the moments are positively correlated. For ΩNeg,
some pairs of moments are strongly negatively correlated while other pairs of moments are
positively correlated. The exact numerical specifications of these matrices for different k’s
are in Section 4 of AB and Section S7.1 of the Supplemental Material of AB.
We consider separately cases with k ≤ 10 and cases with k ≥ 10. With k ≤ 10, we
compare the CC and the RCC tests to the recommended tests in AB and RSW. More
specifically, we compare to the bootstrap-based AQLR (adjusted quasi-likelihood ratio) test
in AB and two two-step procedures in RSW, one using their T qlrn statistic and the other using
their Tmaxn statistic.21 With k ≥ 10, we only compare the CC and RCC tests to the RSW
tests as the AB test is no longer computationally feasible. The RSW tests are implemented
using 499 bootstrap draws and with a first-step significance level of 0.005. The AB test is
implemented using 1000 bootstrap draws. These are the recommended values in RSW and
21We use the AB test for comparison because it is tuning parameter free (in the sense that AB proposeand use an optimal choice of the AS tuning parameter), and we use RSW’s two-step tests for comparisonbecause they should be insensitive to reasonable choices of their tuning parameters.
29
AB, respectively.
5.1.1 k ≤ 10
We approximate the size of the tests using the maximum null rejection probability (MNRP)
over a set of µ values that satisfies µ ≥ 0 for each combination of Ω and k. These µ values are
taken from AB, whose calculations suggest that these points are capable of approximating
the size of the tests. We also compute a weighted average power (WAP) for easy comparison.
The WAP is the simple average of a set of carefully chosen points in the alternative space.
We take these points also from AB, who design them to reflect cases with various degrees of
violation or slackness for each of the inequalities. These µ values are given in Section 4 of
AB and Section S7.1 of the Supplemental Material of AB. Besides WAP, we also report size-
corrected WAP, which is obtained by adding a (positive or negative) number to the critical
value where the number is set to make the size-corrected MNRP equal to the nominal level.
Table 1 shows the MNRP and WAP results when Wi is normally distributed with known
Ω. In this case, only the RCC test should have exact size. The CC test should be somewhat
under-sized especially with small k. The results are consistent with these theoretical predic-
tions. The MNRP of the RCC test is within simulation error of 5%, and the MNRP of the
CC test is somewhat below the MNRP of the RCC test. The MNRP’s of the AB and the
RSW1 tests appear to be more different from 5% than the RCC test, while the MNRP of
the RSW2 test is close to 5%.
Table 2 shows the results when Wi is normally distributed with estimated Ω. In this
case, none of the tests have exact size. The RCC test still has very good MNRP at k = 2,
but has noticeably larger MNRP (up to 7.4% from 5%) when k = 10 with Ω = ΩNeg. This
may reflect the difficulty in estimating Ω with a small sample size (n = 100). The AB test
and the RSW1 test continue to have good size, while the RSW2 test now exhibits some
over-rejection when k = 10 and Ω = Ωzero.
In terms of weighted average power, the RCC test has weakly higher ScWAP than both
RSW tests in all but one case in Table 2 (estimated Ω), and in all but two cases in Table 1
(known Ω). The RCC test has higher ScWAP than the AB test in 4 out of 9 cases in both
Table 1 and Table 2. The ScWAP of all the tests, except RSW2, are quite close to each other,
with differences between them no greater than 6 percentage points. The ScWAP of RSW2
is close to the other tests in all cases except when the moments have negative correlations
(Ω = ΩNeg), when they are much lower, especially for k = 10.
On the computational side, the AB test and the RSW1 test are 200-400 times as costly
as the RCC test, and the RSW2 test is 4-9 times as costly as the RCC test, as shown in the
Time columns in the tables. Also note that the AB test is computed using 1000 bootstrap
30
Table 1: Finite Sample Maximum Null Rejection Probabilities and Size-Corrected AveragePower of Nominal 5% Tests (normal distribution, known Ω, n = 100)
k = 10 k = 4 k = 2
Test MNRP WAP ScWAP Time MNRP WAP ScWAP Time MNRP WAP ScWAP Time
Note: CC, RCC, AB, RSW1 and RSW denote the conditional chi-squared test, the refined CC test, the adjusted quasi-likelihoodratio test with bootstrap critical value in AB, the two-step test in RSW based on the QLR statistic and that based on the Maxstatistic, respectively. MNRP, WAP, ScWAP and Time denote maximum null rejection probability, weighted average power, size-corrected WAP, and average computation time used in seconds in each Monte Carlo simulation. Cases with different k and knowledgestatus of Ω may have been assigned to different machines and their computation times are not comparable. But times across testsare comparable. The AB test and the RSW tests use 1000 and 499 bootstrap draws respectively. The results for the CC, RCC, andRSW2 tests are based on 5000 simulations, while those for the AB and RSW1 tests are based on 2000 simulations for feasibility.
Table 2: Finite Sample Maximum Null Rejection Probabilities and Size-Corrected AveragePower of Nominal 5% Tests (normal distribution, estimated Ω, n = 100)
k = 10 k = 4 k = 2
Test MNRP WAP ScWAP Time MNRP WAP ScWAP Time MNRP WAP ScWAP Time
Table 4: Finite Sample Maximum Null Rejection Probabilities and Size-Corrected AveragePower of Nominal 5% Tests (mixed normal distribution, known Ω, n = 100)
k = 10 k = 4 k = 2
Test MNRP WAP ScWAP Time MNRP WAP ScWAP Time MNRP WAP ScWAP Time
Table 6: Finite Sample Maximum Null Rejection Probabilities and Size-Corrected AveragePower of Nominal 5% Tests (mixed normal distribution, estimated Ω, n = 100)
k = 10 k = 4 k = 2
Test MNRP WAP ScWAP Time MNRP WAP ScWAP Time MNRP WAP ScWAP Time
draws while the RSW tests using 499 bootstrap draws. Increasing the number of bootstrap
draws would increase their computational costs proportionally.
The theoretical properties of the CC tests rely in an essential way on the normality or
asymptotic normality of the moments. Thus, we report results when Wi is not normal to
investigate the sensitivity of these simulations to the data distribution. Tables 3 and 5 report
results when Wi has a student t distribution with 3 degrees of freedom, denoted by t(3). This
distribution is chosen to investigate the consequences of thick tails on the tests. Tables 4 and
6 report results when Wi has a mixed normal distribution, taken to be the equal probability
mix of N(−2, 1) and N(2, 4) scaled to have unit variance. This distribution is chosen to
investigate the consequences of a skewed and bimodal distribution on the tests.
Tables 5 and 6 show the results for the tests when Ω is estimated. The size performance
of the RCC test varies somewhat with the data distribution. It has worse MNRPs under
the skewed mixed normal distribution (up to 9.6% for k = 10) than under normality (Table
6), while it has better MNRPs under the t(3) distribution than under normality (Table 5).
It is noteworthy that the bootstrap-based AB, RSW1, and RSW2 tests have lower worst
case MNRP’s across Tables 3-6 (6.9%, 6.4%, 9.3%, respectively, compared to 9.6% for the
RCC test). This may reflect a type of refinement property for the bootstrap, but more
investigation is needed. It also is interesting to note that the over-rejection either disappears
or is greatly reduced when the true Ω is used, as shown in Tables 3 and 4. This seems to
suggest that the non-normality of the sample moments is less of an issue than estimating
the variance matrix in a relatively small sample.
It is worth noting that, in this context, different tests direct power to different alterna-
tives, and there is not a test that is uniformly most powerful. The ScWAP comparison hides
important power variations across different directions. To investigate these power variations,
Figure 2 reports simulated power curves for the tests in the k = 2 case with normally dis-
tributed moments, estimated Ω, and Ω = ΩZero. The power curves are functions of the true
mean vector, µ = (µ1, µ2). When either µ1 or µ2 is negative, the corresponding inequality is
violated, and we expect higher power. As the figure shows, the RCC test has better power
when only one inequality is violated, while the AB and the RSW1 tests have better power
when both inequalities are violated. We expect this pattern extends to cases with more in-
equalities and more general variance-covariance matrices: the AB or RSW1 tests have better
power when all or most of the inequalities are violated, while the RCC test has better power
when few inequalities are violated. If a researcher has a preference for tests that direct power
in a particular direction, they can choose a test based on this pattern. Otherwise, when such
a preference is not present, the RCC test is at least competitive with the other tests in terms
of size and power.
34
Figure 2: Power Curves for 5 Tests (k = 2, normal distribution, Ω = ΩZero, Estimated Ω,n = 100, α = 5%)
−0.3 −0.2 −0.1 0 0.1 0.2 0.30
0.2
0.4
0.6
0.8
1
CCRCCAB
RSW1RSW2
(a) µ1 = −0.3
−0.3 −0.2 −0.1 0 0.1 0.2 0.30
0.2
0.4
0.6
0.8
1CC
RCCAB
RSW1RSW2
(b) µ1 = −0.15
−0.3 −0.2 −0.1 0 0.1 0.2 0.30
0.2
0.4
0.6
0.8
1CC
RCCAB
RSW1RSW2
(c) µ1 = 0
−0.3 −0.2 −0.1 0 0.1 0.2 0.30
0.2
0.4
0.6
0.8
1CC
RCCAB
RSW1RSW2
(d) µ1 = 0.15
Note: CC denotes the conditional chi-squared test, RCC denotes the refined CC test, AB denotes the adjusted quasi-likelihoodratio (AQLR) test with bootstrap critical value in AB, and RSW1 and RSW2 denote the two-step test in RSW based on the QLRstatistic and the Max statistic, respectively. The AB test uses 1000 bootstrap draws and the RSW tests uses 499 bootstrap draws.The results for the CC, RCC, and RSW2 tests are based on 5000 simulations, while the results for the AB and RSW1 tests arebased on 2000 simulations for computational reasons.
35
5.1.2 k ≥ 10
One advantage of the CC and RCC tests is that they remain feasible when the number of
inequalities (k) and the sample size (n) are both large. In this subsection, we investigate the
size and computational time of the RCC, CC, RSW1, and RSW2 tests when both k and n are
large. Table 7 reports the results for four pairs of (k, n): (10, 100), (50, 700), (100, 1600), and
(150, 2550), where the pairs are chosen so that k is approximately proportional to n/log(n).
The message from Table 7 is quite encouraging. The MNRP’s of all the tests appear to
be stable as we move across columns. The computational time of the RCC and CC tests
increases the slowest with k, while that for RSW2 increases the fastest.
None of these tests have been proven to control size asymptotically when k grows with
n at this rate, but these simulations suggest such a result could be formulated, under the
correct assumptions. Intuitively, if the moments are approximately normal, in some sense,
then one can appeal to Theorem 1 as a good approximation. We do not pursue this type
of result here, but note two challenges to keep in mind. On the theoretical side, a theory
of Gaussian approximations for quadratic forms, such as the likelihood ratio statistic, that
covers this high-dimensional case is an open question, to the best of our knowledge. On the
practical side, a consistent covariance matrix estimator can be difficult to find. A potential
way to improve covariance matrix estimation is to assume sparsity or use shrinkage as in
Ledoit and Wolf (2012). It would be interesting to study the theoretical properties of the
CC and RCC tests in settings with many inequalities, but we leave that to future research.
5.2 Subvector Inference in Interval Regression
To investigate the finite sample performance of the subvector CC and RCC tests, we consider
a special case of Example 4, where Y ∗i = s∗i is the probability of an event of interest. For
example, the event can be death by homicide for a random person in county i, or a product
being purchased by a random consumer in market i. For simplicity, a simple logit model is
assumed for the probability: s∗i =exp(X′iθ0+Z′ciδ0+εi)
1+exp(X′iθ0+Z′ciδ0+εi), where εi is the country or market level
unobservable that satisfies E[εi|Zi] = 0. Then (11) holds with
ψ(Y ∗i , Xi, θ) = log(s∗i /(1− s∗i ))−X ′iθ. (44)
The variable s∗i is unobserved, but we observe sN,i, an empirical estimate of s∗i based
on N independent chances for the event of interest to happen. NsN,i follows a binomial
distribution with parameters (N, s∗i ). For example, N could be the population of a county
while sN,i is the homicide rate of the county. We use the method introduced in Gandhi et al.
36
Table 7: Finite Sample Maximum Null Rejection Probabilities of Nominal 5% Tests,the Large k and n Cases (Estimated Ω, Ω = Ωzero)
k = 10, n = 100 k = 50, n = 700 k = 100, n = 1600 k = 150, n = 2550
Note: CC denotes the conditional chi-squared test, RCC denotes the refined CC test, and RSW1 andRSW2 denote the two-step test in RSW based on the QLR statistic and the Max statistic, respectively.MNRP denotes maximum null rejection probability, and Time denotes average computation time inseconds for the test in each Monte Carlo repetition. The RSW tests use 499 critical value simulations.The results for the CC, RCC and RSW2 tests are based on 5000 simulations, while the results for theRSW1 tests are based on 2000 simulations for computational reasons.
Table 8: Average Value, Length, and Computation Time (in seconds) of Confidence Intervals
Note: The identified set for θ0 is [−1.203,−.757]. The computation times across different (n, dc) cases arenot comparable because they may have been performed by different computers on the computer cluster. Thecomputation of different tests within each (n, dc) case is always completed on the same computer. Thus thecomputation times across tests are comparable.
37
(2019) to construct ψLi (θ) and ψUi (θ) based on sN,i. By Gandhi et al. (2019), for N ≥ 100,
Thus, when dc = 2 (or 3, 4), there are 4 (or 8, 16) instrumental functions, which give us 8
(or 16, 32) moment inequalities.
We consider 5000 Monte Carlo repetitions. In each repetition, we generate an i.i.d. data
set, sN,i, Xi, Zini=1, for two sample sizes, n = 500 and n = 1000. For each repetition, we
compute an implied confidence interval for the sRCC test and the sCC test. We also include
the hybrid test of ARP (with their recommended tuning parameter and number of simulation
draws) for comparison. For all tests, the CI endpoints are computed with an accuracy to
the third digit. The details for computing the confidence intervals are given in Supplemental
22These bounds are not necessarily sharp, but that is not important for our purpose, which is to investigatethe statistical performance of the sCC and sRCC tests.
38
Appendix E.2.
Table 8 reports the average confidence interval (CI), average excess length (= length of CI
- length of identified set), as well as average computation time for the CI. As the table shows,
the average computation time of the sCC and the sRCC tests are identical to each other up
to 1/10 of a second in all cases. This is mainly because the vertex enumeration required to
compute the refinement is easy to compute for dc = 2, and the refinement is rarely needed
for dc = 3 and dc = 4. The sRCC and sCC tests are faster than the ARP hybrid test in
all cases. The relative computational cost of the ARP hybrid test seems to improve as the
model gets bigger, but it remains more than 14 times as costly as the subvector CC and
RCC tests when dc = 4.
In terms of length, the sRCC and sCC confidence intervals are similar to each other, and
are shorter on average than the ARP hybrid test for all cases. As we move from dc = 2
to dc = 4, the model contains more and more non-informative moment inequalities since
the added control variables do not contribute in the data generating process. All tests are
negatively affected by these non-informative inequalities to various degrees.
Figure 3 reports the rejection rates of the tests for H0 : θ0 = θ plotted against θ values in
[−2.5, 0.5].23 The shaded area indicates the identified set for θ0. As we can see, the rejection
rates for the points in the identified set are less than or equal to 5% in all cases. For dc = 3
and dc = 4, all tests shows some under-rejection at the boundary of the identified set, while
the under-rejection of the ARP hybrid test appears to be less. It is encouraging to see that
the under-rejection does not translate to poor power. The power of the sCC and sRCC tests
are nearly identical to each other and are higher than the power of the ARP hybrid test
except in the area of θ immediately next to the identified set, consistent with the excess
length results in Table 8. However, we note that this comparison is specific to this example,
and the power comparison may change with other examples or data generating processes.
6 Conclusion
This paper proposes the refined conditional chi-squared (RCC) test for moment inequality
models. This test compares a quasi-likelihood ratio statistic to a chi-squared critical value,
where the number of degrees of freedom is the rank of the active inequalities. This test has
many desirable properties, including being tuning parameter and simulation free, adaptive
to slackness, easy to code, and invariant to redundant inequalities. We show that, with an
easy refinement, it has exact size in normal models and has uniformly asymptotically exact
23For all three tests, a point is considered rejected if it is outside the confidence interval. We found in oursimulations that these rejection rates are slightly lower than those obtained directly point by point.
39
Figure 3: Rejection Rates of the sCC, sRCC, and ARP hybrid tests for dc ∈ 2, 3, 4 andfor Sample Size n ∈ 500, 1000 with Nominal Size 5%.
−2.5 −2 −1.5 −1 −0.5 0 0.50
0.2
0.4
0.6
0.8
1sCC
sRCCARP Hyb
(a) dc = 2, n = 500
−2.5 −2 −1.5 −1 −0.5 0 0.50
0.2
0.4
0.6
0.8
1sCC
sRCCARP Hyb
(b) dc = 2, n = 1000
−2.5 −2 −1.5 −1 −0.5 0 0.50
0.2
0.4
0.6
0.8
1sCC
sRCCARP Hyb
(c) dc = 3, n = 500
−2.5 −2 −1.5 −1 −0.5 0 0.50
0.2
0.4
0.6
0.8
1sCC
sRCCARP Hyb
(d) dc = 3, n = 1000
−2.5 −2 −1.5 −1 −0.5 0 0.50
0.2
0.4
0.6
0.8
1sCC
sRCCARP Hyb
(e) dc = 4, n = 500
−2.5 −2 −1.5 −1 −0.5 0 0.50
0.2
0.4
0.6
0.8
1sCC
sRCCARP Hyb
(f) dc = 4, n = 1000
40
size in asymptotically normal models. We also propose a version of the test for subvector
inference with conditional moment inequalities and when the nuisance parameters enter lin-
early. Simulations show the RCC and subvector RCC tests have a computational advantage
over alternatives while being competitive in terms of size and power.
References
Andrews, D. and Barwick, P. (2012). Inference for parameters defined by moment inequali-
ties: A recommended moment selection procedure. Econometrica, 80:2805–2862.
Andrews, D. and Guggenberger, P. (2009). Validity of subsampling and “plug-in asymptotic”
inference for parameters defined by moment inequalities. Econometric Theory, 25:669–709.
Andrews, D. and Soares, G. (2010). Inference for parameters defined by moment inequalities
using generalized moment selection. Econometrica, 78:119–157.
Andrews, D. W. and Shi, X. (2013). Inference based on conditional moment inequalities.
Econometrica, 81:609–666.
Andrews, I., Roth, J., and Pakes, A. (2019). Inference for linear conditional moment in-
equalities.
Baccara, M., Imrohoroglu, A., Wilson, A. J., and Yariv, L. (2012). A field study on matching
with network externalities. American Economic Review, 102:1773–1804.
Bachem, A. and Kern, W. (1992). Linear Programming Duality: An Introduction to Oriented
Matroids. Springer-Verlag Berlin, Heidelberg.
Bajari, P., Benkard, C. L., and Levin, J. (2007). Estimating dynamic models of imperfect
competition. Econometrica, 75:1331–1370.
Bartholomew, D. J. (1961). A test of homogeneity of means under restricted alternatives.
Journal of the Royal Statistical Society. Series B (Methodological), 23:239–281.
Beresteanu, A., Molchanov, I., and Molinari, F. (2011). Sharp identification regions in models
with convex moment predictions. Econometrica, 79:1785–1821.
Blundell, R., Gosling, A., Ichimura, H., and Meghir, C. (2007). Changes in the distribution
of male and female wages accounting for employment composition using bounds. Econo-
metrica, 75:323–363.
41
Bugni, F., Canay, I., and Shi, X. (2017). Inference for functions of partially identified
parameters in moment inequality models. Quantitative Economics, 8:1–38.
Bugni, F. A. (2010). Bootstrap inference in partially identified models defined by moment
inequalities: Coverage of the identified set. Econometrica, 78:735–753.
Canay, I. A. (2010). El inference for partially identified models: Large deviations optimality
and bootstrap validity. Journal of Econometrics, 156:408–425.
Canay, I. A. and Shaikh, A. (2017). Practical and theoretical advances for inference in par-
tially identified models. In B. Honore, A. Pakes, M. Piazzesi, and L. Samuelson (Eds)
Advances in Economics and Econometrics: Volume 2: Eleventh World Congress, (Econo-
metric Society Monographs, pp. 271-306). Cambridge University Press.
Chen, X., Christensen, T. M., and Tamer, E. (2018). Monte carlo confidence sets for identified
sets. Econometrica, 86:1965–2018.
Chernozhukov, V., Hong, H., and Tamer, E. (2007). Estimation and confidence regions for
parameter sets in econometric models. Econometrica, 75:1243–1284.
Chetty, R. (2012). Bounds on elasticities with optimization frictions: A synthesis of micro
and macro evidence on labor supply. Econometrica, 80:969–1018.
Ciliberto, F. and Tamer, E. (2009). Market structure and multiple equilibria in airline
markets. Econometrica, 77:1791–1828.
Eizenberg, A. (2014). Upstream innovation and product variety in the u.s. home pc market.
The Review of Economic Studies, 81:1003–1045.
Fack, G., Grenet, J., and He, Y. (2019). Beyond truth-telling: Preference estimation with
centralized school choice and college admissions. American Economic Review, 109:1486–
1529.
Fourier, J. B. J. (1826). Solution d’une question particuliere du calcul des inegalites. Nouveau
Bulletin des sciences par la Societe philomathique de Paris, p. 99, pages 317–319.
Gale, D. (1960). The Theory of Linear Economic Models. McGraw-Hill Book Company, 1
edition.
Gandhi, A. K., Lu, Z., and Shi, X. (2019). Estimating demand for differentiated products
with zeroes in market share data.
42
Guggenberger, P., Hahn, J., and Kim, K. (2008). Specification testing under moment in-
equalities. Economics Letters, 99:375–378.
He, Y. (2017). Gaming the boston school choice mechanism in beijing. Unpublished
manuscript, Department of Economics, Rice University.
Ho, K. and Rosen, A. (2017). Partial identification in applied research: Benefits and chal-
lenges. In B. Honore, A. Pakes, M. Piazzesi, and L. Samuelson (Eds) Advances in Eco-
nomics and Econometrics: Volume 2: Eleventh World Congress, (Econometric Society
Monographs, pp. 307-359). Cambridge University Press.
Holmes, T. J. (2011). The diffusion of wal-mart and economies of density. Econometrica,
79:253–302.
Huber, M. and Mellace, G. (2015). Testing instrument validity for late identification based
on inequality moment constraints. The Review of Economics and Statistics, 97:398–411.
Iaryczower, M., Shi, X., and Shum, M. (2018). Can words get in the way? the effect of
deliberation in collective decision-making. Journal of Political Economy, 126:688–734.
Kaido, H., Molinari, F., and Stoye, J. (2019). Confidence intervals for projections of partially
vertices). MATLAB Central File Exchange. Retrieved June 9, 2020.
Kudo, A. (1963). A multivariate analogue of the one-sided test. Biometrika, 50:403–418.
Ledoit, O. and Wolf, M. (2012). Nonlinear shrinkage estimation of large-dimensional covari-
ance matrices. The Annals of Statistics, 40:1024–1060.
Magnolfi, L. and Roncoroni, C. (2016). Estimation of discrete games with weak assump-
tions on information. Unpublished manuscript, Department of Economics, University of
Wisconsin at Madison.
Manski, C. F. and Tamer, E. (2002). Inference on regressions with interval data on a regressor
or outcome. Econometrica, 70:519–546.
43
Mohamad, D. A., van Zwet, E. W., Cator, E. A., and Goeman, J. J. (2020). Adaptive critical
value for constrained likelihood ratio testing. Biometrica, 107:677–688.
Molinari, F. (2020). Econometrics with partial identification. In S. Durlauf, L. Hansen,
J. Heckman, and R. Matzkin (Eds) Handbook of Econometrics: Volume 7A, 1st Edition.
North Holland.
Morales, E., Sheu, G., and Zahler, A. (2019). Extended gravity. The Review of Economic
Studies, 86:2668–2712.
Nevo, A. and Rosen, A. (2012). Identification with imperfect instruments. The Review of
Economics and Statistics, 94:659–671.
Pakes, A., Porter, J., Ho, K., and Ishii, J. (2015). Moment inequalities and their applications.
Econometrica, 83:315–334.
Rambachan, A. and Roth, J. (2020). An honest approach to parallel trends. Unpublished
manuscript, Department of Economics, Harvard University.
Rogers, A. J. (1986). Modified lagrange multiplier tests for problems with one-sided alter-
natives. Journal of Econometrics, 31:341–361.
Romano, J., Shaikh, A., and Wolf, M. (2014). A practical two-step method for testing
moment inequalities. Econometrica, 82:1979–2002.
Romano, J. P. and Shaikh, A. M. (2008). Inference for identifiable parameters in partially
identified econometric models. Journal of Statistical Planning and Inference, 138:2786–
2807. Special Issue in Honor of Theodore Wilbur Anderson, Jr. on the Occasion of his
90th Birthday.
Romano, J. P. and Shaikh, A. M. (2012). On the uniform asymptotic validity of subsampling
and the bootstrap. the Annals of Statistics, 40:2798–2822.
Rosen, A. (2008). Confidence sets for partially identified parameters that satisfy a finite
number of moment inequalities. Journal of Econometrics, 146:107–117.
Sheng, S. (2016). A structural econometric analysis of network formation games. Unpublished
manuscript, Department of Economics, University of California Los Angeles.
Sierksma, G. and Zwols, Y. (2015). Linear and Integer Optimization: Theory and Practice.
CRC Press, Taylor & Francis Group, 3 edition.
44
Stoye, J. (2009). More on confidence intervals for partially identified parameters. economet-
rica, 77:1299–1315.
Sullivan, C. J. (2017). The ice cream split: Empirically distinguishing price and product space
collusion. Unpublished manuscript, Department of Economics, University of Wisconsin at
Madison.
Tamer, E. (2003). Incomplete simultaneous discrete response model with multiple equilibria.
The Review of Economic Studies, 70:147–165.
Uhlig, H. (2005). What are the effects of monetary policy on output? results from an
agnostic identification procedure. Journal of Monetary Economics, 52:381–419.
Wolak, F. (1987). An exact test for multiple inequality and equality constraints in the linear
regression model. Journal of the American Statistical Association, 82:782–793.
Wollman, T. G. (2018). Trucks without bailouts: Equilibrium product characteristics for
commercial vehicles. American Economic Review, 108:1364–1406.
45
Supplemental Appendix for “Simple Adaptive Size-Exact Testingfor Full-Vector and Subvector Inference in Moment Inequality
Models”
Gregory Cox and Xiaoxia Shi
This supplemental appendix contains proofs and other supporting materials for “Simple
Adaptive Size-Exact Testing for Full-Vector and Subvector Inference in Moment Inequality
Models” (henceforth referred to as the main text) by Gregory Cox and Xiaoxia Shi. The
following sections are included:
Section A contains the proof of Theorem 1 in the main text.
Section B contains the proof of Theorem 2 in the main text. This section also includes
Theorem 3, a general theorem for uniform asymptotic properties of the CC and RCC
tests, as well as the proof of Theorem 3. Theorem 3 is used to prove Theorem 2.
Section C contains supporting materials for Section 3.2 in the main text. This section
includes lemmas that reduce the calculation of the sCC test to a rank calculation
problem. It also includes an algorithm to carry out the rank calculation in the case
not covered in the main text.
Section D proves the asymptotic validity of the Subvector Tests by verifying the con-
ditional asymptotic normality of the sample moments and the consistency of the two
conditional variance matrix estimators proposed in Section 3.2 in the main text.
Section E provides details for the identified set calculation and the confidence interval
calculation in Section 5.2 in the main text.
A Proof of Theorem 1
For this proof, we assume Σn(θ) = nIdm . If this is not the case, then the following proof can be
applied after premultiplying mn(θ) by n1/2Σn(θ)−1/2 and postmultiplying A by n−1/2Σn(θ)1/2.
Fix θ and let X = mn(θ) ∼ N(µ, Idm), where µ = EFmn(θ). Let C = µ ∈ Rdm|Aµ ≤ b.The fact that θ ∈ Θ0(F ) implies µ ∈ C. These simplifications imply that Tn(θ) = ‖X − µ‖2
and
τj =
‖a1‖(bj−a′j µ)
‖a1‖‖aj‖−a′1ajif ‖a1‖‖aj‖6= a′1aj
∞ otherwise. (47)
46
The definitions of µ, J , r, τ , and β are unchanged. µ is the projection of X onto C. We also
denote it by PCX. We also denote J by J(X), r by r(X), τ by τ(X), and β by β(X).
A.1 Auxiliary Lemmas
The proof of Theorem 1 relies on four lemmas.
The first lemma partitions Rdm according to which inequalities are active. We define
some notation for the partition. For any J ⊆ 1, ..., dA, let J c = 1, ..., dA/J , and let
CJ = x ∈ C : ∀j ∈ J, a′jx = bj, and ∀j ∈ J c, a′jx < bj. Then CJ forms a partition of C.
Also let VJ = ∑
j∈J vjaj : vj ∈ R, vj ≥ 0, and let KJ = CJ + VJ .24 The following lemma
shows that KJ forms a partition that characterizes which inequalities are active.
Lemma 1. (a) If X ∈ KJ , then X − PCX ∈ VJ and PCX ∈ CJ .
(b) The set of all KJ for J ⊆ 1, ..., dA is a partition of Rdm.
(c) For every J ⊆ 1, ..., dA, X ∈ KJ iff J = J(X).
The next lemma considers the event r = 0 and partitions that event according to which
face of C is closest to the realization of X. Let J0 = j ∈ 1, ..., dA|aj = 0 and let
J00 = j ∈ 1, ..., dA|aj = 0 and bj = 0. Also let
J1 =J ⊆ 1, ..., dA|rk(AJ) = 1, J ∩ J0 = J00, (48)
and if j ∈ J, ` ∈ J c, s.t. ‖aj‖> 0, ‖a`‖> 0,
thenaj‖aj‖
6= a`‖a`‖
orbj‖aj‖
6= b`‖a`‖.
Further subdivide
J os1 =J ∈ J1| if j, ` ∈ J s.t. ‖aj‖> 0, ‖a`‖> 0, then
aj‖aj‖
=a`‖a`‖ (49)
J ts1 =J ∈ J1|∃j, ` ∈ J s.t. ‖aj‖> 0, ‖a`‖> 0,
aj‖aj‖
= − a`‖a`‖
andbj‖aj‖
= − b`‖a`‖.
The next lemma provides a partition of C0 := ∪J⊆1,...,dA:rk(AJ )=0CJ . (Note that for these
sets, CJ = KJ .) Let J6=0 = j = 1, . . . , dA : ‖aj‖6= 0. For each J ∈ J os1 , let
C∆J = x ∈ C0|argminj∈J 6=0
‖aj‖−1(bj − a′jx) = J ∩ J6=0. (50)
24When J = ∅, then VJ = 0dm.
47
The set C∆J is the set of points in C that are closer to CJ than to any other CJ for J ∈ J1.
It is helpful to picture CJ for J ∈ J1 as the faces of a polyhedron, C, and C∆J as a partition
of C into triangularly shaped sets. Also let
C | = C0/(∪J∈J os1
C∆J
). (51)
Lemma 2. (a) C0 = CJ00.
(b) The sets C | and C∆J for J ∈ J os
1 form a partition of C0.
(c) If A 6= 0dA×dm, then C | has Lebesgue measure zero.
(d) ∪J⊆1,...,dA|rk(AJ )=1KJ = ∪J∈J os1 ∪J ts1KJ .
The next lemma bounds the probabilities of translations of sets in the multivariate normal
distribution. Let V denote an arbitrary cone in Rr for a positive integer r.25 Let V ∗ denote
the polar cone. That is, V ∗ = γ ∈ Rr|〈y, γ〉 ≤ 0 for all y ∈ V . For any γ ∈ V ∗, let
Y ∼ N(γ, Ir). The following lemma provides a property of probabilities of cones under a
translation.
Lemma 3. For every γ ∈ V ∗, Prγ(‖Y ‖2> χ2r,1−α|Y ∈ V ) ≤ α, with equality if γ = 0.
Lemma 3 states that the probability that a random vector, Y , belongs to the tail of its
distribution, conditional on belonging to the cone, V , is less than or equal to α, where the
tail is any point outside a sphere of radius√χ2r,1−α. The key assumption is that the mean of
Y must belong to the polar cone, V ∗, which translates the distribution away from the cone,
V . When γ = 0, this lemma holds with equality because unconditionally ‖Y ‖2∼ χ2r, the tail
of which has mass exactly α, and because ‖Y ‖2 has exactly the same distribution whether
or not we condition on Y ∈ V . Lemma 3 follows from Lemma 1 in Mohamad et al. (2020),
and thus the proof is omitted.
The following lemma is the key to validity of the refinement to the CC test. It is a bound
on translations of sets in the univariate normal distribution.
Lemma 4. For every µ ≤ 0, for every τ ≥ 0, and for every α ∈ [0, 1],
Prµ(Z > z1−β/2|Z > −τ
)≤ α,
where Z ∼ N(µ, 1) and β = 2αΦ(τ), with equality if µ = 0.
25A cone is a set, V , such that for all v ∈ V and for all λ ≥ 0, λv ∈ V .
48
A.2 Proof of Theorem 1
First, we show part (a). Notice that
Pr(‖X − PCX‖2> χ2rk(AJ(X)),1−β(X))
=∑
J⊆1,...,dA
Pr(X ∈ KJ and ‖X − PCX‖2> χ2rk(AJ ),1−β(X))
=∑
J⊆1,...,dA|rk(AJ )≥2
Pr(X ∈ KJ and ‖X − PCX‖2> χ2rk(AJ ),1−α) (52)
+∑J∈J ts1
Pr(X ∈ KJ and ‖X − PCX‖2> χ21,1−α) (53)
+∑J∈J os1
Pr(X ∈ KJ and ‖X − PCX‖2> χ21,1−β(X)) (54)
+∑
J⊆1,...,dA|rk(AJ )=0
Pr(X ∈ KJ and ‖X − PCX‖2> χ20,1−α), (55)
where the first equality follows from Lemma 1(b,c), and the second equality uses Lemma
2(d) and the fact that β(X) = α whenever rk(AJ(X)) 6= 1 or J ∈ J ts1 . That latter fact follows
because for J ∈ J ts1 with X ∈ KJ , there exists j, ` ∈ J such that ‖a`‖−1a` = −‖aj‖−1aj
and ‖a`‖−1b` = −‖aj‖−1bj, which implies that b`− a′`PCX = bj − a′jPCX = 0 (and therefore
τ(X) = 0).
For each J , we consider the span of VJ as a subspace of Rdm . Let PJ denote the projection
onto span(VJ), and MJ denote the projection onto its orthogonal complement. We note that,
given J , there exists a κJ ∈ span(VJ) such that for every z ∈ CJ , PJz = κJ . This follows
because for two z1, z2 ∈ CJ , and for any v ∈ span(VJ), 〈z1 − z2, v〉 = 0, which implies
z1 − z2 ⊥ span(VJ), so that PJ(z1 − z2) = 0dm . Thus, for any X ∈ KJ , we can write
PJX = PJ(X − PCX) + PJPCX = X − PCX + κJ , where the second equality follows by
Lemma 1(a) and the above discussion. We also write MJX = X − PJX = PCX − κJ .
First, let’s consider the terms in (55). For J such that rk(AJ) = 0, we have span(VJ) =
0dm. Thus, PJX = κJ = 0dm . This implies that ‖X − PCX‖= 0. Therefore,
Pr(X ∈ KJ and ‖X − PCX‖2> χ20,1−α) = 0. (56)
For J such that rk(AJ) > 0, we define a linear isometry from span(VJ) to Rrk(AJ ). Let
BJ be a dm × rk(AJ) matrix whose columns form a basis for span(VJ). Then PJX =
BJ(B′JBJ)−1B′JX. The projection matrix BJ(B′JBJ)−1B′J is idempotent with rank rk(AJ),
and thus there exists a dm×rk(AJ) matrix with orthonormal columns, QJ , such that QJQ′J =
BJ(B′JBJ)−1B′J . The linear isometry from span(VJ) to Rrk(AJ ) is QJ(X) = Q′JX. This is an
49
isometry because for any v1, v2 ∈ span(VJ),
‖v1 − v2‖2 = (v1 − v2)′(v1 − v2)
= (v1 − v2)′(PJ(v1 − v2))
= (v1 − v2)′QJQ′J(v1 − v2)
= ‖QJ(v1)−QJ(v2)‖2, (57)
where the second equality holds because v1, v2 ∈ span(VJ). Now let Q′JVJ = Q′Jv : v ∈ VJ.Then PJX−κJ ∈ VJ if and only if Q′J(PJX−κJ) ∈ Q′JVJ because this isometry is bijective.
Next, we consider the terms in (52) and (53). Notice that
Pr(X ∈ KJ and ‖X − PCX‖2> χ2rk(AJ ),1−α)
= Pr(MJX + κJ ∈ CJ , PJX − κJ ∈ VJ , and ‖PJX − κJ‖2> χ2rk(AJ ),1−α)
(2) Let x ∈ Υ. Consider the set argminj∈J 6=0‖aj‖−1(bj − a′jx). We first show that the
argmin is equal to J ∩ J6=0. If ` ∈ J6=0/J , an algebraic manipulation similar to above shows
27b` cannot be negative because, by assumption, θ ∈ Θ0(F ), so µ ∈ C, and therefore C is non-empty.28We note here that τ(x) is defined for an arbitrary active inequality j ∈ J ∩ J 6=0. One can verify that the
definition of τ(x) does not depend on which j ∈ J ∩ J 6=0 is selected.
which uses the same decomposition of X ∈ KL as in (58), and the fact that MLX only
depends on x1 (and that β(X) and β(X) only depend on X through PCX = MLX + κL).
Fix x1. We show that the inner integral goes to zero as s→∞.
Fix an arbitrary subsequence in s. We show that there exists a further subsequence such
that the inner integral goes to zero. Since β(MLX + κL) and β(MLX + κL) do not depend
on x2 and both lie in [α, 2α] for all s, there exists a further subsequence along which both
converge. Denote the limits by β∞ and β∞. Also note that PLX − κL = QPL x2 + PLµ− κL.
Take a further subsequence such that PLµ − κL diverges or converges and such that QPL
converges to QPL,∞ (since QP
L has unit length, it must converge along a subsequence). We
consider two cases.
(i) If PLµ−κL diverges, then for every x2, ‖QPL x2+PLµ−κL‖2≥ (‖PLµ−κL‖−‖QP
L x2‖)2 →∞, so gs(x1, x2) = 0 eventually as s → ∞ along this subsequence. Therefore by the domi-
nated convergence theorem, the inner integral in (87) goes to zero.
(ii) If PLµ−κL converges to some κ∞, then fix x2 such that ‖QPL,∞x2+κ∞‖2 6= χ2
1,1−β∞ and
‖QPL,∞x2+κ∞‖2 6= χ2
1,1−β∞. Note that the set of such x2 is a set of probability one with respect
to x2 ∼ N(0, 1). We show that gs(x1, x2) = 0 eventually. Consider ‖a`‖−1a′`(µ − PCX) for
` /∈ J . If there exists an ` /∈ J and a subsequence of s such that ‖a`‖−1a′`(µ−PCX)→ ±∞,
then
‖X − PCX‖ = ‖[QPL , Q
ML ]′X + µ− PCX‖ (89)
≥±a′`
([QP
L , QML ]′X + µ− PCX
)‖a`‖
(90)
=±a′`[QP
L , QML ]′X
‖a`‖± a′` (µ− PCX)
‖a`‖→ +∞, (91)
where the inequality follows by Cauchy-Schwarz and the convergence follows from the fact
that ‖a`‖−1a′`[QPL , Q
ML ]′X is bounded. This shows that ‖PLX − κL‖2= ‖X − PCX‖2>
χ21,1−α ≥ χ2
1,1−β(MLX+κL) eventually, and therefore gs(x1, x2) = 0 eventually along this sub-
58
sequence. Otherwise, suppose ‖a`‖−1a′`(µ − PCX) is bounded along a subsequence for all
` /∈ J . We show that β∞ = β∞. Let j ∈ L such that aj 6= 0. Then note that for each ` /∈ J ,
τ`(X) =‖aj‖(b` − a′`PCX)
‖aj‖‖a`‖−a′ja`=
(b` − a′`PCX)/‖a`‖1− a′
ja`/(‖aj‖‖a`‖)
≥ 1
2
b` − a′`PCX‖a`‖
→ ∞, (92)
where the convergence follows because ‖a`‖−1(b` − a′`µ) → ∞ and ‖a`‖−1(a′`µ − a′`PCX) is
bounded (and if the denominator is zero, then τ`(X) =∞). Therefore,
τ(X) = inf6=jτ`(X) = min
(inf
`∈J ;` 6=jτ`(X), inf
`/∈Jτ`(X)
)= min
(τ(X), inf
`/∈Jτ`(X)
). (93)
If τ(X)→∞, then τ(X)→∞ too, and if τ(X) converges to a finite value, τ(X) converges
to the same value. This shows that β∞ = β∞. Therefore gs(x1, x2) = 0 eventually along this
subsequence. Since every subsequence has a further subsequence such that gs(x1, x2) = 0
eventually, it follows that gs(x1, x2) = 0 eventually along the original sequence. Therefore
by the dominated convergence theorem, the inner integral in (87) goes to zero.
Since the inner integral in (87) converges to zero in either case (i) or (ii) for every fixed x1,
by the dominated convergence theorem, the outer integral converges to zero too along this
subsequence. Since every subsequence has a further subsequence such that (87) converges
to zero, this shows (81).
A.3 Proofs of the Auxiliary Lemmas
Proof of Lemma 1. (a) By assumption, X ∈ KJ = CJ + VJ . So, we write X = X1 + X2,
where X1 ∈ CJ and X2 ∈ VJ . Then, PCX1 = X1 because X1 ∈ C already. We show that
PCX = X1. By a property of projection onto convex sets, it is necessary and sufficient that
for all y ∈ C, we have 〈X −X1, y −X1〉 ≤ 0.32 This follows because X2 =∑
j∈J vjaj with
vj ≥ 0, so
〈X2, y −X1〉 =∑j∈J
vj(〈aj, y〉 − 〈aj, X1〉) ≤ 0, (94)
where the inequality uses the fact that y ∈ C, so a′jy ≤ bj and X1 ∈ CJ , so a′jX1 = bj.
Combining these, we get that PCX = X1 ∈ CJ and X − PCX = X −X1 = X2 ∈ VJ .
(b) We first show that every X belongs to some KJ . For every X, PCX ∈ C, so there
exists a J such that PCX ∈ CJ .
By the inner-product property of projection, we know that for all y ∈ C, 〈y−PCX,X −32See Section 3.12 in Luenberger (1969). Hereafter, call this property of projection onto a convex set the
“inner-product property.”
59
PCX〉 ≤ 0. Using this fact, let z ⊥ span(VJ). Then, there exists a ε > 0 such that PCX + εz
and PCX − εz both belong to C.33 Then, 〈εz,X − PCX〉 ≤ 0 and 〈−εz,X − PCX〉 ≤ 0.
These two inequalities imply that 〈z,X − PCX〉 = 0. Thus, X − PCX is orthogonal to all
vectors, z, which are orthogonal to span(VJ). This implies that X − PCX ∈ span(VJ).
If X −PCX /∈ VJ , then by the separating hyperplane theorem,34 there exists a direction,
c ∈ span(VJ) such that 〈c,X−PCX〉 > 0 and 〈c, aj〉 < 0 for all j ∈ J . We consider PCX+εc.
We show that for ε sufficiently small, (1) PCX + εc ∈ C, and (2) 〈X − PCX, εc〉 > 0.
(1) For j ∈ J , 〈PCX + εc, aj〉 = bj + ε〈c, aj〉 < bj, where the equality follows because
PCX ∈ CJ and the inequality follows from the definition of c. For j ∈ J c, 〈PCX +
εc, aj〉 = 〈PCX, aj〉 + ε〈c, aj〉, which is less than bj for ε sufficiently small because
〈PCX, aj〉 < bj.
(2) 〈X − PCX, εc〉 = ε〈X − PCX, c〉 > 0 by the definition of c.
This contradicts the inner-product property of projection onto a convex set, and therefore
X − PCX ∈ VJ , and X ∈ KJ .
We next show that no X belongs to two distinct KJ . If X ∈ KJ and KJ ′ , then, by part
(a), PCX ∈ CJ and PCX ∈ CJ ′ . But this is a contradiction because the projection onto a
convex set is unique, and the CJ form a partition of C.
(c) If X ∈ KJ , then PCX ∈ CJ , so all the inequalities in J are active. If X /∈ KJ , then
X is in a different KJ ′ , for some J ′ 6= J , by part (b). Thus, J 6= J(X) = J ′.
Proof of Lemma 2. (a) Note that J00 satisfies rk(AJ00) = 0. Thus, it is sufficient to show
that CJ = ∅ for all J ⊆ J0 that are not J00. If J 6= J00, then either (i) there exists j ∈ J00/J
or (ii) there exists j ∈ J/J00. In the first case, any x ∈ CJ would have to satisfy 0′x < 0, a
contradiction. In the second case, any x ∈ CJ would have to satisfy 0′x = bj, where bj 6= 0,
another contradiction.
(b) We first show that the C∆J are disjoint for different J ∈ J os
This implies that J1 = J2. Part (b) then follows from the definition of C |.
33This uses the slackness of the inequalities in the definition of CJ .34See Section 11 of Rockafellar (1970) or Section 5.12 in Luenberger (1969).
60
(c) For any x ∈ C |, let J6=0(x) = argminj∈J 6=0‖aj‖−1(bj − a′jx). We show below that
∃j, ` ∈ J 6=0(x) s.t. ‖aj‖−1aj 6= ‖a`‖−1a`. (97)
That implies that for any x ∈ C |, there exists j, ` ∈ J6=0 such that ‖aj‖−1aj 6= ‖a`‖−1a` and
‖aj‖−1(bj − a′jx) = ‖a`‖−1(b` − a′`x). Or equivalently,
C | ⊆ ∪ j,`∈J 6=0‖aj‖a` 6=‖a`‖aj
x ∈ Rdm|‖aj‖b` − ‖a`‖bj = (‖aj‖a` − ‖a`‖aj)′x. (98)
Since the right hand-side is a finite union of measure-zero subspaces of Rdm , it must be that
C | has Lebesgue measure zero, establishing part (c).
Now we show (97). Let J(x) = J00 ∪ J6=0(x). We note that J 6=0(x) is not empty because
A 6= 0dA×dX . This implies that rk(AJ(x)) ≥ 1. Then there are two possibilities: rk(AJ(x)) ≥ 2
and rk(AJ(x)) = 1. In the first case, (97) holds trivially.
In the latter case, we first show that J(x) ∈ J1. Suppose there exists j ∈ J(x) and
` ∈ 1, ..., dA/J(x) such that ‖aj‖> 0, ‖a`‖> 0,aj‖aj‖ = a`
‖a`‖, and
bj‖aj‖ = b`
‖a`‖. This implies
‖a`‖−1(b` − a′`x) = ‖aj‖−1(bj − a′jx), (99)
so ` should also belong to J(x). Since such a j and ` cannot exist, it must be the case that
J(x) ∈ J1. The fact that x ∈ C | means that J(x) /∈ J os1 . Thus, it must be that J(x) ∈ J ts
1 ,
which also implies (97). Therefore (97) holds in all cases. This proves part (c).
(d) First note that for every J ∈ J os1 ∪ J ts
1 we have rk(AJ) = 1. Thus, it is sufficient to
show that for every J ⊆ 1, ..., dA with rk(AJ) = 1 and J /∈ J os1 ∪ J ts
1 , we have KJ = ∅.Note that if J ∩ J0 6= J00, then either (i) there exists j ∈ J00/(J0 ∩ J) or (ii) there exists
j ∈ (J ∩J0)/J00. In the first case, any x ∈ CJ would have to satisfy 0′x < 0, a contradiction.
In the second case, any x ∈ CJ would have to satisfy 0′x = bj, where bj 6= 0, another
contradiction. This implies that CJ , and therefore KJ , is empty.
We next note that if j ∈ J while ` ∈ 1, ..., dA/J with ‖aj‖> 0, ‖a`‖> 0,aj‖aj‖ = a`
‖a`‖,
andbj‖aj‖ = b`
‖a`‖, then any x ∈ CJ should satisfy
‖a`‖−1(b` − a′`x) = ‖aj‖−1(bj − a′jx) = 0, (100)
so ` should also belong to J . This contradiction implies that CJ , and therefore KJ , must be
empty.
This implies that the only nonempty KJ with rk(AJ) = 1 must belong to J1. If we
suppose that J /∈ J os1 , then there must exist j, ` ∈ J s.t. ‖aj‖> 0, ‖a`‖> 0, and
aj‖aj‖ 6=
a`‖a`‖
.
61
However, since rk(AJ) = 1, a` and aj must be collinear. This implies thataj‖aj‖ = − a`
‖a`‖.
Then, any x ∈ CJ must satisfy
0 = ‖a`‖−1(b` − a′`x) = ‖aj‖−1(bj − a′jx). (101)
This implies ‖a`‖−1b` = −‖aj‖−1bj, which implies that J ∈ J ts.
Therefore, the only J ⊆ 1, ..., dA with rk(AJ) = 1 and KJ 6= ∅ belong to J os1 ∪J ts
1 .
Proof of Lemma 4. For every λ ≥ 0, let
f(λ) =
∫ ∞−τ
(α− 1Z > z1−β/2)e−12
(Z+λ)2
dZ. (102)
Note that α ≤ 1 implies that β ≤ 2Φ(τ), which in turn implies that z1−β/2 ≥ −τ . We show
that f(λ) ≥ 0 for all λ ≥ 0. This is sufficient because
αPr µ(Z ≥ −τ)− Prµ(Z ≥ z1−β/2)
=
∫ ∞−τ
(α− 1Z > z1−β/2)1√2πe−
12
(Z−µ)2
dZ
=f(−µ)√
2π≥ 0 (103)
for all µ ≤ 0.
Let f ′(λ) denote the derivative of f . We show that (1) f(0) ≥ 0 and (2) for all λ ≥ 0,
f ′(λ) ≥ −(z1−β/2 + λ
)f(λ). Together, these two properties imply that f(λ) ≥ 0 because, if
not, then there exists a λ > 0 such that f(λ) < 0. Then, by the mean value theorem, there
exists a λ ∈ (0, λ) such that f(λ) < 0 and f ′(λ) < 0, which contradicts property (2).
Property (1) holds because
f(0)√2π
=
∫ ∞−τ
(α− 1Z > z1−β/2)1√2πe−
12Z2
dZ
= αΦ(τ)− (1− Φ(z1−β/2)) = αΦ(τ)− β/2 = 0. (104)
This also shows that equality holds when µ = 0.
To show that property (2) holds, we evaluate
f ′(λ) =d
dλ
∫ ∞−τ
(α− 1Z > z1−β/2)e−12
(Z+λ)2
dZ
= −∫ ∞−τ
(Z + λ)(α− 1Z > z1−β/2)e−12
(Z+λ)2
dZ
62
= −∫ z1−β/2
−τα(Z + λ)e−
12
(Z+λ)2
dZ +
∫ ∞z1−β/2
(1− α)(Z + λ)e−12
(Z+λ)2
dZ
≥ −∫ z1−β/2
−τα(z1−β/2 + λ)e−
12
(Z+λ)2
dZ +
∫ ∞z1−β/2
(1− α)(z1−β/2 + λ)e−12
(Z+λ)2
dZ
= −(z1−β/2 + λ)
∫ ∞−τ
(α− 1Z > z1−β/2)e−12
(Z+λ)2
dZ
= −(z1−β/2 + λ)f(λ), (105)
where the second equality follows by dominated convergence and the inequality follows from
the events Z > z1−β/2 and Z ≤ z1−β/2.
B Theorem 3 and the Proof of Theorem 2
In this section we prove Theorem 3, a general theorem for uniform asymptotic properties of
the CC and RCC tests. Theorem 3 is used to prove Theorem 2.
B.1 Theorem 3: A General Asymptotic Theorem
In this section, we sometimes make explicit the dependence of A and b on θ, denoting them
by A(θ) and b(θ). The rows of A(θ) are denoted by aj(θ), and submatrices composed of the
rows of A(θ) are denoted by AJ(θ).
Assumption 2. The given sequence (Fn, θn) : Fn ∈ F , θn ∈ Θ0(Fn)∞n=1 satisfies, for every
subsequence, nm, there exists a further subsequence, nq, and there exists a sequence of positive
definite dm × dm matrices, Dq such that:
(a) Under the sequence Fnq∞q=1,
√nqD
−1/2q (mnq(θnq)− EFnqmnq(θnq))→d N(0,Ω), (106)
for some positive definite correlation matrix, Ω, and
‖D−1/2q Σnq(θnq)D
−1/2q − Ω‖→p 0. (107)
(b) ΛqA(θnq)Dq → A0 for some dA × dm matrix A0, and for every J ⊆ 1, ..., dA,rk(IJA(θnq)Dq) = rk(IJA0), where Λq is the diagonal dA × dA matrix whose jth diagonal
entry is one if e′jA(θnq) = 0 and ‖e′jA(θnq)Dq‖−1 otherwise.
Remark. The matrix Dq typically is the diagonal matrix of variances of the elements of√nqmnq(θnq). In part (a), we allow each diagonal element to go to zero (or infinity) at
63
different rates, to incorporate the cases where different moments are on different scales or
where different moments involve time series processes that are integrated at different orders.
Andrews and Guggenberger (2009), Andrews and Soares (2010), and Andrews et al. (2020)
also use a diagonal normalizing matrix for this purpose. Moreover, the matrix Dq can
be non-diagonal, which is useful when the asymptotic variance matrix of√nq(mnq(θnq) −
EFnmnq(θnq)) is singular but a certain rotation of the vector with proper scaling has a non-
singular asymptotic variance matrix.
Part (b) is not required to show the uniform asymptotic validity of the RCC test. It is
only used to show asymptotic size-exact and the asymptotic IDI property. The existence
of A0 follows by the choice of the subsequence, while the rank condition is used to verify
Lemma 6, below.
The following theorem is a general asymptotic theorem used to show the uniform asymp-
totic properties of the RCC test.
Theorem 3. (a) Suppose Assumption 2(a) holds for all sequences (Fn, θn) : Fn ∈ F , θn ∈Θ0(Fn)∞n=1. Then,
limsupn→∞
supF∈F
supθ∈Θ0(F )
EF (φRCCn (θ, α)) ≤ α.
Next consider a sequence (Fn, θn) : Fn ∈ F , θn ∈ Θ0(Fn)∞n=1 satisfying Assumption 2(a,b).
(b) If, along any further subsequence, for all j = 1, ..., dA,√nqe
′jΛq(A(θnq)EFnqmnq(θnq)−
b(θnq))→ 0, and if A0 6= 0dA×dm, then,
limn→∞
EFnφRCCn (θn, α) = α.
(c) If, for J ⊆ 1, ..., dA, along any further subsequence,√nqe
′jΛq(A(θnq)EFnqmnq(θnq)−
b(θnq))→ −∞ as q →∞, for all j /∈ J , then
limn→∞
PrFn(φRCCn (θn, α) 6= φRCC
n,J (θn, α))
= 0.
Remarks. (1) Notice that no assumptions are placed on A(θ) for Theorem 3(a). It can be
low-rank or any submatrix of A(θ) can be local to singular as θ varies. This is achieved by
an extra step in the proof that adds inequalities that are redundant in the finite sample but
are relevant in the limit (see Lemma 7 below).
(2) If θn and Fn are such that EFnmn(θn) does not depend on n (for example, if Wini=1
is stationary under Fn with a fixed marginal distribution and θ ∈ Θn(Fn) is fixed), and if
A(θn) and b(θn) are fixed, then the condition in part (c) is automatically satisfied with J
64
equal to the set of all binding inequalities. If, in addition, AJ(θn) 6= 0, parts (b) and (c) can
be combined to show that the RCC test has exact asymptotic size.
B.2 Auxiliary Lemmas for Theorem 3
The proof of Theorem 3 uses four important lemmas. Lemma 5 establishes a condition under
which the projection onto a sequence of polyhedra converges when the coefficient matrix
defining the polyhedra converges. The condition is verified in a special context in Lemma
6, which is used to prove part (b) of Theorem 3. The conditions for part (a) are not strong
enough for us to apply Lemma 5 because we do not restrict the rank of A(θ). Nonetheless,
Lemma 7 shows that inequalities that are redundant in finite sample but relevant in the
limit can be added to guarantee the condition of Lemma 5, and help us to prove part (a) of
Theorem 3. Lemma 8 shows that the additional inequalities from Lemma 7 do not change
the definition of β.
First we define some notation. For any dA × dm real-valued matrix A and vector h ∈RdA
+,∞ := [0,∞]dA , let poly(A, h) = µ ∈ Rdm : Aµ ≤ h denote the polyhedron defined by
inequalities with coefficients given by A and constants given by h. Also define
µ∗(x;A, h) = argminµ∈poly(A,h)
‖x− µ‖2. (108)
The lemma considers a sequence of dA × dm real-valued matrices An∞n=1 and a sequence
of dA × 1 vectors hn ∈ RdA+ := [0,∞)dA such that, as n → ∞, An → A0 and hn → h0 for a
dA × dm real-valued matrix A0 and a vector h0 ∈ RdA+,∞. Also, let xn ∈ Rdm be a sequence of
vectors such that xn → x0 ∈ Rdm as n → ∞. We say that a sequence of sets, poly(An, hn),
Kuratowski converges to a limit set, poly(A0, h0), denoted by
poly(An, hn)K→ poly(A0, h0), (109)
if (i) for every x0 ∈ poly(A0, h0) there exists a sequence xn ∈ poly(An, hn) such that xn → x0,
and (ii) for every subsequence nq and for every converging sequence xnq ∈ poly(Anq , hnq) that
converges to a point x0, we have x0 ∈ poly(A0, h0).35
Lemma 5. If poly(An, hn)K→ poly(A0, h0), then µ∗(xn;An, hn)→ µ∗(x0;A0, h0).
We denote submatrices of An and A0 formed by the rows with indices in J ⊆ 1, ..., dAby AJ,n and AJ,0. Important for the following lemma is the fact that every element of hn is
nonnegative for all n.
35One can check that this definition of Kuratowski convergence is equivalent to other definitions given in,for example Aubin and Frankowska (1990).
65
Lemma 6. If for all J ⊆ 1, . . . , dA, rk(AJ,n) = rk(AJ,0) for all n, then poly(An, hn)K→
poly(A0, h0).
For any dA × dm matrix, A, and for any vector, g, let J(x;A, g) = j ∈ 1, . . . , dA :
a′jµ∗(x;A, g) = gj. This generalizes the previous notation for active inequalities to make
explicit the dependence on A and g. Also let [A;B] denotes the vertical concatenation of
two matrices, A and B.
Lemma 7. Let An be a sequence of dA × dm matrices such that each row is either zero
or belongs to the unit circle. Let gn be a sequence of nonnegative dA-vectors. Then, there
exists a subsequence, nq, a sequence of dB × dm matrices, Bq, and a sequence of nonnegative
dB-vectors hq such that the following hold.
(a) Anq → A0, Bq → B0, gnq → g0, and hq → h0 (some of the elements of g0 and h0 may
be +∞, in which case the convergence/divergence occurs elementwise).
Next, we show that Assumption 1, combined with the additional assumptions in The-
orem 2(b), implies Assumption 2(b). First note that each element of Λq is either one or
‖e′jA(θnq)Dq‖−1. By the common additional condition for Theorem 2(b,c), ‖e′jA(θnq)Dq‖−1→‖e′jA∞‖−1. Note that e′jA(θnq)Dq cannot go to zero if e′jA(θnq) 6= 0 because that would vi-
olate the common additional condition for Theorem 2(b,c) for J = j. Therefore, there
exists a further subsequence along which Λq → Λ∞ for a positive definite diagonal matrix
Λ∞. Therefore, ΛqA(θnq)Dq → A0 = Λ∞A∞. Also note that for each J ⊆ 1, ..., dA,rk(AJ(θnq)Dq) = rk(IJA∞) = rk(IJA0), where the first equality follows from the common
additional condition for Theorem 2(b,c) and the second equality follows because each row
of A0 is a positive scalar multiple of the corresponding row of A∞. This verifies Assumption
2(b).
We also note that along every further subsequence, each diagonal element of Λq converges
to a positive value. This implies that, for part (b), we have for every j = 1, ..., dA,
√nqe
′jΛq(Aj(θnq)EFnqmnq(θnq)− b(θnq))→ 0. (118)
Also, for part (c), we have for every j /∈ J ,
√nqe
′jΛq(Aj(θnq)EFnqmnq(θnq)− b(θnq))→ −∞. (119)
Also, for part (b), A0 6= 0 is implied by A∞ 6= 0 because Λ∞ is positive definite.
68
Therefore, Theorem 2 follows from Theorem 3.
Proof of Theorem 3
We first prove part (a). Let θn, Fn∞n=1 be an arbitrary sequence satisfying Fn ∈ F and
θn ∈ Θ0(Fn) for all n. Let nm be an arbitrary subsequence of n. It is sufficient to show
that there exists a further subsequence, nq, such that as q →∞,
liminfq→∞
Pr Fnq
(Tnq(θnq) ≤ χ2
r,1−β
)≥ 1− α. (120)
Fix an arbitrary subsequence, nm. By Assumption 2(a), there exists a further subse-
quence, nq, a sequence of positive definite matrices, Dq, and a positive definite correlation
matrix, Ω0, such that36
√nqD
−1/2q (mnq(θnq)− EFnqmnq(θnq))→d Y ∼ N(0,Ω0), and (121)
D−1/2q Σnq(θnq)D
−1/2q →p Ω0. (122)
We introduce some simplified notation. Let Ωq = D−1/2q Σnq(θnq)D
−1/2q , X = Ω
−1/20 Y ∼
N(0, I), Yq =√nqD
−1/2q (mnq(θnq)−EFnqmnq(θnq)), and Xq = Ω
−1/2q Yq. Equations (121) and
(122) imply that
Xq →d X ∼ N(0, I), and (123)
Ωq →p Ω0. (124)
The remainder of the proof proceeds in four steps. (A) In the first step, the problem
defined in (17) is transformed to include additional inequalities. (B) In the second step,
notation is defined for partitioning Rdm according to Lemma 1, for both finite q and the
limit. (C) In the third step, the almost sure representation theorem is invoked on the
convergence in (123) and (124). (D) In the final step, we show that (almost surely) the event
Tnq(θnq) ≤ χ2r,1−β eventually implies a limiting event based on X and Ω0. This limiting event
has probability at least 1− α from Theorem 1.
(A) Consider the sequence of matrices A(θnq)D1/2q . For each q, let Λq denote a dA × dA
diagonal matrix with positive entries on the diagonal such that each row of ΛqA(θnq)D1/2q
is either zero or belongs to the unit circle. Such a Λq always exists by taking the diagonal
element to be the inverse of the magnitude of the corresponding row of A(θnq)Dq, if it
is nonzero, and one otherwise. Let gq =√nqΛq(b(θnq) − A(θnq)EFnqmnq(θnq)). With this
36For notational simplicity, we denote all further subsequences by nq
69
notation, we can write
Tnq(θnq) = infy:ΛqA(θnq )D
1/2q y≤gq
(Yq − y)′Ω−1q (Yq − y), (125)
which adds and subtracts EFnqmnq(θnq) in the objective and applies the change of variables,
y =√nqD
−1/2q (µ− EFnqmnq(θnq)).
We can apply Lemma 7 to ΛqA(θnq)D1/2q and gq to get a further subsequence, nq, a
sequence of matrices, Bq, a sequence of vectors, hq, matrices A0 and B0, and vectors g0 and
h0, satisfying Lemma 7(a-d). Let
Aq =
[ΛqA(θnq)D
1/2q
Bq
]and hq =
[gq
hq
], (126)
and similarly for A0 and h0. Let dA = dA + dB. We have that
Tnq(θnq) = infy:Aqy≤hq
(Yq − y)′Ω−1q (Yq − y) (127)
= inft:AqΩ
1/2q t≤hq
(Xq − t)′(Xq − t), (128)
where the first equation follows from Lemma 7(b) and the second equation follows from the
change of variables t = Ω−1/2q y.
Equation (128) has changed the problem by adding additional inequalities. We verify
that the rank of the active inequalities is unchanged. For any positive definite matrix, Ω, let
Jq(x,Ω) be the set of indices for the active inequalities in the problem:
infy:ΛqA(θnq )D
1/2q y≤gq
(x− y)′Ω−1(x− y). (129)
Recall that J is the set of active inequalities for the problem defined in (17), which is equal
to Jq(Yq, Ωq) by a change of variables. Similarly, let Jq(x,Ω) be the set of active inequalities
in the problem:
inft:AqΩ1/2t≤hq
(x− t)′(x− t). (130)
Also let t∗q(x,Ω) denote the unique minimizer. We have that for any y ∈ Rdm and for any
where the second equality follows by (a) and the definition of κqJ(Ω). Then, we can also write
M qJ(Ω)x = x− P q
J (Ω)x = t∗q(x,Ω)− κqJ(Ω).
71
Let rq(x,Ω) = rk(AJq(x,Ω),q). When rq(x,Ω) = 1, we can define
τ qj (x,Ω) =
‖aqj(Ω)‖(hj,q−aqj (Ω)′t∗q(x,Ω))
‖aqj (Ω)‖‖aqj(Ω)‖−aqj (Ω)′aq
j(Ω)
if ‖aqj(Ω)‖‖aqj(Ω)‖6= aqj(Ω)′aq
j(Ω)
∞ else, (134)
where j ∈ Jq(x,Ω) such that aqj(Ω) 6= 0. We also let τ q(x,Ω) = infj=1,...,dA
τ qj (x,Ω), and
βq(x,Ω) = 2αΦ(τ q(x,Ω)). When rq(x,Ω) 6= 1, let τ qj (x,Ω) = 0, so that βq(x,Ω) = α. Note
that β = βq(Xq, Ωq), where the addition of extra inequalities via Lemma 7 has no effect on
β or τ because of Lemma 8, where the condition is satisfied by Lemma 7(b).
We define similar notation for the limiting objects. Let J∞ = ` ∈ 1, ..., dA : h`,0 <∞.These are the indices for the inequalities that are “close-to-binding.” For any positive definite
matrix, Ω, let A∞(Ω) denote the matrix formed by the rows of A0Ω1/2 associated with the
indices in J∞. For notational simplicity, we refer to the rows of A∞(Ω) using ` ∈ J∞ even
(C) Next, we invoke the almost sure representation theorem on the convergence in (123)
and (124).38 Then, we can treat the convergence in (123) and (124) as holding almost
surely.39 For the rest of the proof of part (a), consider A∞(Ω), P∞J (Ω), κ∞J (Ω), and the
objects defined in (135) and let the objects without the argument (Ω) denote the objects
evaluated at Ω0. For example, A∞ = A∞(Ω0).
We now construct an event, Ξ ⊆ Rdm , such that Pr(X ∈ Ξ) = 1. For every L ⊆ J∞, let
V ∞L+ = x ∈ V ∞L |∀L′ ⊆ L, if rqL′ < r∞L then MNL′x 6= 0. (139)
For each L ⊆ J∞ such that r∞L > 0, let
ΞL = x ∈ K∞L : P∞L x− κ∞L ∈ V ∞L+ and (P∞L x− κ∞L )′(P∞L x− κ∞L ) 6= χ2r∞L ,1−β∞(x). (140)
37Recall that rqJ does not depend on q due to the construction of the subsequence nq.38See van der Vaart and Wellner (1996), Theorem 1.10.3, for the a.s. representation theorem.39This can be formalized by defining random variables, Xq, X, and Ωq, satisfying Xq =d Xq, X =d X,
Ωq =d Ωq, Xq →a.s. X, and Ωq →a.s. Ω0.
73
Since r∞L > 0, P∞L X ∼ N(0, P∞L ), which is absolutely continuous on span(V ∞L ), and
therefore the probability that P∞L X − κ∞L lies in any one of the finitely many subspaces,
null(MNL′ ) = x ∈ Rdm : MN
L′x = 0, each with dimension rqL′ < r∞L , is zero. Also,
(P∞L X − κ∞L )′(P∞L X − κ∞L ) is absolutely continuous because it can be written as the sum
of rk(A∞L ) squared normal random variables. Also, χ2r∞L ,1−β∞(X) depends on X only through
M∞L X, which is independent of P∞L X. Therefore, for each fixed M∞
L X, the conditional
probability that (P∞L X − κ∞L )′(P∞L X − κ∞L ) = χ2r∞L ,1−β∞(M∞L X+κ∞L ) is zero. This implies that
the unconditional probability is also zero. Therefore,
Pr(P∞L X − κ∞L ∈ V ∞L /V ∞L+ or (P∞L X − κ∞L )′(P∞L X − κ∞L ) = χ2r∞L ,1−β∞(X)) = 0. (141)
For L ⊆ J∞ such that rk(A∞L ) = 0, let ΞL = K∞L . Then, let Ξ = ∪L⊆J∞ΞL. Therefore,
by property (b∞) and equation (141), Pr(X ∈ Ξ) = 1.
(D) We consider the set of all sequences such that xq → x∞ ∈ Ξ and Ωq → Ω0. By
the definition of Ξ and the almost sure convergence of (Xq, Ωq), these sequences occur with
probability one. Fix such a sequence for the remainder of the proof of part (a). For this
step, consider Aq(Ω), P qJ (Ω), and the objects defined in (132), and let the objects without
the argument (Ω) denote the objects evaluated at Ωq.
Now note that µ∗(xn, An, hn) = arg minz:Anz≤hn‖xn− z‖2. This sequence of minimizers is
necessarily bounded because otherwise (162) cannot hold. Thus for any subsequence nmthere is a further subsequence nq such that µ∗(xnq , Anq , hnq) → z∞ for some z∞ ∈ Rdm .
Since Anqµ∗(xnq , Anq , hnq) ≤ hnq , we have A0z∞ ≤ h0. Thus,
Let AJ∗0 ,0 denote the submatrix of A0 formed by the rows selected by J∗0 , and let AJ∗0 ,n,
hJ∗0 ,0, and hJ∗0 ,n be defined analogously. Now let D be a (dm − |J0|) × dm matrix, the rows
of which form an orthonormal basis for the orthogonal complement to the space spanned
by aj,0 : j ∈ J∗0. Then the matrix(AJ∗0 ,0D
)is invertible, which implies that the matrix(
AJ∗0 ,nD
)is invertible for large enough n. Let h∧J∗0 ,n = min(hJ∗0 ,n, hJ∗0 ,0), where the minimum
80
is taken element by element. Let
z†n =(AJ∗0 ,nD
)−1 ( h∧J∗0 ,nDz0
). (169)
It is easy to verify that
z†n →(AJ∗0 ,0D
)−1 ( hJ∗0 ,0Dz0
)= z0, and (170)
AJ∗0 ,nz†n = h∧J∗0 ,n ≤ hJ∗0 ,n. (171)
If a′j,nz†n ≤ hj,n for all j ∈ Jo0 for large enough n, then (160) holds with z∗n = z†n and we
are done. Otherwise, let
λn =
min
1,minj∈Jo0 :hj,0>0
hj,n
a′j,nz†n
if j ∈ Jo0 : hj,0 > 0 6= ∅
1 otherwise. (172)
This is well-defined for large enough n since a′j,nz†n → a′j,0z0 = hj,0 and thus a′j,nz
†n 6= 0 for
large enough n. Also, by definition λn ≤ 1, and
λn → minj∈Jo0 :hj,0>0
hj,0a′j,0z0
= 1. (173)
Now let
z∗n = λnz†n. (174)
Then for any j ∈ Jo0 such that hj,0 > 0, we have
a′j,nz∗n ≤ hj,n (175)
by the definition of λn. For any j ∈ Jo0 such that hj,0 = 0, we have
a′j,nz∗n = λn
∑j∗∈J∗0
wj,j∗,na′j∗,nz
†n
= λn∑j∗∈J∗0
wj,j∗,n min(hj∗,n, hj∗,0)
= 0 ≤ hj,n, (176)
where the first equality follows by the definition of the weights, wj,j∗,n, the second equality
81
follows from the definition of z†n, the third equality follows because, if wj,j∗,n 6= 0, then
wj,j∗ 6= 0, and therefore 0 ≤ min(hj∗,n, hj∗,0) ≤ hj∗,0 ≤ hj,0 = 0 by property (ii) of the
partition.
Equations (170), (173), and (174) together imply that z∗n → z0. This also implies that,
for all j /∈ J0, a′j,nz∗n − hj,n → a′j,0z0 − hj,0 < 0 and thus, for large enough n,
A1,...,dA/J0,nz∗n < h1,...,dA/J0,n. (177)
This combined with equations (171), λn ≤ 1, and (174)-(176) implies that Anz∗n ≤ hn.
Therefore, z∗n satisfies the requirement and the lemma is proved.
Proof of Lemma 7. The proof of Lemma 7 makes use of three additional lemmas which are
stated and proved at the end of this subsection. We use bj,q to denote the transpose of the
jth row of Bq, and similarly for aj,nq , aj,0, and bj,0. An equivalent way to state condition (d)
is:
(i) for any further subsequence, nq, and for every sequence xq ∈ poly(Anq , gnq)∩poly(Bq, hq)
such that xq → x0, x0 ∈ poly(A0, g0) ∩ poly(B0, h0), and
(ii) for every x0 ∈ poly(A0, g0)∩poly(B0, h0), there exists xq ∈ poly(Anq , gnq)∩poly(Bq, hq)
such that xq → x0.
Before proving the lemma, we note that for any subsequence, nq such that Anq → A0
and gnq → g0, and for any Bq → B0 and hq → h0, condition (d)(i) is satisfied. Specifically,
let xq denote a sequence that belongs to poly(Anq , gnq)∩poly(Bq, hq) for all q, and such that
xq → x0. Then
a′j,0x0 = limq→∞
a′j,nqxq ≤ limq→∞
gj,nq = gj,0. (178)
Also, by the convergence of hq, we have that
b′j,0x0 = limq→∞
b′j,qxq ≤ limq→∞
hj,q = hj,0. (179)
Therefore, x0 ∈ poly(A0, g0) ∩ poly(B0, h0).
We also note that for any q, Bq, and hq satisfying (b), condition (c) must also be satisfied.
If not, then there exists a q, an x ∈ poly(Anq , gnq) and a j′ ∈ J(x;Bq, hq) such that bj′,q
cannot be written as a linear combination of aj,nq for j ∈ J(x;Anq , gnq). This implies that
there exists a v such that b′j′,qv > 0 and v ⊥ aj,nq for all j ∈ J(x;Anq , gnq). But then,
x + αv ∈ poly(Anq , gnq) for sufficiently small α, at the same time that b′j′,q(x + αv) > hq.
This contradicts the fact that poly(Anq , gnq) ⊆ poly(Bq, hq). Therefore, (c) holds.
82
We now prove the lemma by finding a subsequence, nq, and sequences Bq and hqthat satisfy conditions (a), (b), and (d)(ii). We first consider An and gn. By the compactness
of the unit circle, let nq be a subsequence so that Anq converges to some A0. Also suppose
gnq converges along the subsequence to some vector g0 ∈ RdA+,∞.
Let J+A denote the subset of 1, ..., dA for which gj,0 > 0, and let J0
A denote the subset
for which gj,0 = 0. Consider AJ0A,0
, which defines a cone in Rdm : poly(AJ0A,0,0) = x ∈ Rdm :
AJ0A,0x ≤ 0. Let S denote the smallest linear subspace of Rdm that contains this cone. Let
the dimension of S be dS. Let JSA be the subset of J0A for which aj,0 ⊥ S for all j ∈ JSA. Let
JNA = 1, ..., dA/JSA.
Next, we define sequences Bq and hq that satisfy conditions (a), (b), and (d)(ii) by
induction on the dimension of S. If dS = 0, then no Bq or hq is required. Condition
(a) is satisfied by the above choice of the subsequence. Condition (b) is satisfied because
poly(Bq, hq) = Rdm for all q. Condition (d)(ii) is satisfied because poly(A0, g0) = 0,and then we can take xq = 0 for all q, which belongs to poly(Anq , gnq) and converges to
x0 = 0 ∈ poly(A0, g0).
If dS > 0, then suppose the conclusion of Lemma 7 holds for all values of the dimension
of S less than dS.
Let Cq = poly(AJSA,nq , gJSA,nq). Let CSq be the projection of Cq onto S. That is, CS
q =
PSx : x ∈ Cq, where PS denotes the projection onto S and MS = I − PS. The fact that
Cq is a polyhedral set (defined by finitely many affine inequalities) implies by Theorem 19.3
in Rockafellar (1970) that CSq is also a polyhedral set. Therefore, there exists a dB1 × dm
matrix of unit vectors in S, B1q and a vector h1
q such that CSq = y ∈ S : B1
qy ≤ h1q. We
note that CSq contains zero, so h1
q ≥ 0. In the special case of dS = dm, CSq = Cq and we let
B1q be the matrix composed of all the non-zero rows of Anq and let h1
q be the corresponding
elements of gnq .
Let nq be a further subsequence so that B1q → B1
0 and h1q → h1
0, where some of the
elements of h10 may be +∞, in which case the convergence holds elementwise. We note that
this construction satisfies conditions (a) and (b) because poly(Anq , gnq) ⊆ Cq ⊆ poly(B1q , h
1q)
for all q, where the second subset holds because B1qx = B1
qMSx+B1qPSx = B1
qPSx ≤ h1q for
all x ∈ Cq because the rows of B1q belong to S and PSx ∈ CS
q .
Let J+B denote the set of j ∈ 1, ..., dB1 for which h1
j,0 > 0, and let J0B denote the set for
which h1j,0 = 0, where h1
j,0 is the jth element of h10. Consider B1
J0B ,0
and AJ0A,0
, which together
define a cone in S: x ∈ S : B1J0B ,0x ≤ 0 and AJ0
A,0x ≤ 0. As before, let S† denote the
smallest linear subspace of S that contains this cone. Let JS†
B denote the set of all j ∈ J0B
for which b1j,0 ⊥ S†. Also let JS
†A denote the set of all j ∈ J0
A for which aj,0 ⊥ S†. Let the
dimension of S† be dS† .
83
If dS† < dS, then the result follows by the induction assumption. In particular, if we let
Aq =
[Anq
B1q
]and gq =
[gnq
h1q
],
then the subspace, S, defined to be the smallest linear subspace containing poly(A0, g0),
is equal to S†. Therefore, there exists a further subsequence, nq, and another matrix of
inequalities, B2q and h2
q such that: (a) B2q → B2
0 and h2q → h2
0, (b) poly(Aq, gq) ⊆ poly(B2q , h
2q)
for all q along the subsequence, and (d)(ii) poly(Aq, gq) ∩ poly(B2q , h
2q) → poly(A0, g0) ∩
poly(B20 , h
20) pointwise. It is easy to see that these conditions imply conditions (a), (b), and
(d)(ii) for the original An and gn along this subsequence, with
Bq =
[B1q
B2q
]and hq =
[h1q
h2q
],
using the fact that poly(Aq, gq) = poly(Anq , gnq) ∩ poly(B1q , h
1q).
Therefore, we only need to show condition (d)(ii) in the case that dS† = dS. In this
case, S = S†, and so JS†
B = ∅ and JS†
A = JSA. Fix x0 ∈ poly(A0, g0) ∩ poly(B10 , h
10). We
show that for every ε > 0 there exists a Q such that for all q ≥ Q there exists a yq ∈poly(Anq , gnq) ∩ poly(B1
q , h1q) such that ‖yq − x0‖≤ 2ε. If true, then this can be used to
construct a sequence satisfying yq → x0, establishing condition (d)(ii).
Fix ε > 0. By Lemma 9, there exists a point, x, in S that satisfies b1j,0′x < h1
j,0 for all
j ∈ 1, ..., dB1, and a′j,0x < gj,0 for all j ∈ JNA . There exists a λ ∈ (0, 1) small enough that
x† = λx + (1 − λ)x0 ∈ B(x0, ε), where B(x0, ε) denotes the closed ball of radius ε around
x0. Note that x† satisfies a′j,0x† < gj,0 for all j ∈ JNA and b1
j,0′x† < hj,0 for all j ∈ 1, ..., dB1.
Therefore, there exists a δ ∈ (0, ε) and a Q such that for all q ≥ Q, and for all x ∈ B(x†, δ),
b1j,q′x < h1
j,q for all j ∈ 1, ..., dB1, and a′j,nqx < gj,nq for all j ∈ JNA . Notice that, for all
q ≥ Q, x† ∈ CSq = y ∈ S : B1
qy ≤ h1q, which means that there exists a yq ∈ Cq such that
x† = PSyq. By Lemma 10 applied to K = x† (where the condition is satisfied because, by
Lemma 11, S = x ∈ Rdm : AJS ,0x ≤ 0), there exists a larger Q such that for all q ≥ Q,
yq ∈ B(x†, δ). Therefore, ‖yq − x0‖≤ 2ε.
Proof of Lemma 8. Fix x ∈ Rdm . The fact that poly(A, g) ⊆ poly(B, h) implies that
µ∗(x;A, g) = µ∗(x; [A;B], [g, h]). Denote the common value by µ∗.
If there does not exist a j ∈ J(x;A, g) such that aj 6= 0, then x = µ∗ and a′jx <
gj for all j /∈ J(x;A, g). Suppose, to reach a contradiction, that there does exist a j ∈J(x; [A;B], [g;h]) such that bj−dA 6= 0. Then, there would exist a point, y, very close to x
(say, y = x + εbj−dA for some ε > 0) such that y /∈ poly(B, h) but y ∈ poly(A, g). This
84
contradicts the assumption that poly(A, g) ⊆ poly(B, h). Therefore, there does not exist a
j ∈ J(x; [A;B], [g;h]) such that bj−dA 6= 0. This implies that, in this case, τj(x,A, g) = 0
for all j ∈ 1, ..., dA and τj(x; [A;B], [g;h]) = 0 for all j ∈ 1, ..., dA + dB. Therefore,
τ(x;A, g) = τ(x; [A;B], [g;h]).
Suppose there does exist a j ∈ J(x;A, g) such that aj 6= 0. Then, the same j can be
used to define τj(x; [A;B], [g;h]) because J(x;A, g) ⊆ J(x; [A;B], [g;h]).
We show that for every j = 1, ..., dB, τj+dA(x; [A;B], [g;h]) ≥ τ(x;A, g). The result holds
trivially if ‖bj‖‖aj‖= b′jaj because then τj+dA(x; [A;B], [g;h]) = ∞. Suppose, to reach a
contradiction, that τj+dA(x; [A;B], [g;h]) < τ(x;A, g). Let τ ∗ = τj+dA(x; [A;B], [g;h]), and
consider two cases.
(i) If τ ∗ = 0, then for some ε > 0, the point t∗ = µ∗ + ε(Idm − aja′j‖aj‖−2)bj belongs to
poly(A, g) but not poly(B, h). To see that t∗ ∈ poly(A, g), note that for all ` ∈ J(x;A, g),
the fact that τ(x;A, g) > 0 implies that a` is collinear with aj. Then, a′`t∗ = a′`µ
∗ = g`.
For all ` /∈ J(x;A, g), a′`µ∗ < g`, so ε can be chosen small enough that a′`t
∗ < g` for all
` /∈ J(x;A, g). To see that t∗ /∈ poly(B, h), note that
(by the assumption that τ ∗ < τ(x;A, g) ≤ τ`(x,A, g)), and the second inequality follows
because b′jµ∗ ≤ hj and ‖a`‖‖bj‖≥ a′`bj. This shows that t∗ is on the interior of poly(A, g).
We also show that t∗ is on the boundary of poly(B, h) by calculating that b′jt∗ = hj. By a
similar calculation to above, we see that
(b′jt∗ − hj)(‖bj‖‖aj‖−b′jaj)
=‖aj‖(hj − b′jµ∗)(‖bj‖−
b′jaj
‖aj‖
)+ (b′jµ
∗ − hj)(‖bj‖‖aj‖−b′jaj) = 0. (184)
This implies that there exists a point, y, very close to t∗ (say y = t∗+εbj for some ε > 0) such
that y /∈ poly(B, h) but y ∈ poly(A, g). This contradicts the assumption that poly(A, g) ⊆poly(B, h). Therefore, τj+dA(x; [A;B], [g;h]) ≥ τ(x;A, g) for all j = 1, ..., dB.
Lemma 9. Let A be a dA × dm matrix. Let g be nonnegative. Let J+ denote the subset of
1, ..., dA such that gj > 0, and let J0 denote the subset of 1, ..., dA such that gj = 0. Let
S denote the smallest linear subspace containing poly(AJ0 , 0) = x ∈ Rdm : AJ0x ≤ 0. Let
JS be the subset of J0 for which AJS ⊥ S. Let JN = 1, ..., dA/JS. There exists a x ∈ Ssuch that a′jx < gj for all j ∈ JN .
Proof of Lemma 9. First, let M > maxj∈J+‖aj‖, and let ε ∈ (0,minj∈J+gj/M). Then, for
all x ∈ B(0, ε), a′jx < gj for all j ∈ J+, where B(x, ε) denotes the closed ball of radius ε
around x. Also, for every j ∈ JN ∩ J0, x ∈ S : a′jx = 0 defines a subspace of S. We note
that for all j ∈ JN ∩ J0, x ∈ S : a′jx = 0 is a proper subset of S, because otherwise j
would belong to JS. By the definition of S, S ∩ poly(AJN∩J0 , 0) is not contained within any
of these subspaces. In particular, for each j ∈ JN ∩J0, we can find a xj and a neighborhood,
Nj, (relatively open in S) that belongs to S ∩ poly(AJN∩J0,0, 0)/x ∈ S : a′jx = 0. Indeed,
we can consider j ∈ JN ∩ J0 sequentially, and define each neighborhood to be a subset of
the previous one. Therefore, the final xj must belong to S ∩ poly(AJN∩J0,0,0) and satisfy
a′jx < 0 for all j ∈ JN ∩ J0. Take x = λxj, where λ > 0 is small enough that x ∈ B(0, ε).
Then, x satisfies a′jx < gj for all j ∈ JN .
Lemma 10. Let An → A0 and gn → 0, where gn ≥ 0 for all n. Suppose S = x ∈ Rdm :
A0x ≤ 0 is a linear subspace of Rdm. Let S⊥ denote the orthogonal subspace to S in Rdm.
Let PSx denote the projection of x ∈ Rdm onto S and let MSx denote x − PSx. Then, for
86
every K ⊆ S, compact, and for every ε > 0, we have
x ∈ poly(An, gn) : PSx ∈ K, ‖MSx‖≥ ε = ∅ (185)
eventually as n→∞.
Proof of Lemma 10. Suppose that the conclusion of the lemma is not true. Then there
exists a sequence xn ∈ poly(An, gn) and a subsequence nm such that PSxnm ∈ K and
‖MSxnm‖≥ ε for all m ≥ 1. Define the unit vector x⊥nm = MSxnm/‖MSxnm‖. Then, by
the compactness of K and the unit circle, there exists a further subsequence nq such that
PSxnq → xS and x⊥nq → x⊥ for some xS ∈ S and x⊥ ∈ S⊥ as q →∞.
Because x⊥ ∈ S⊥ and x⊥ 6= 0, we know that x⊥ /∈ S = x ∈ Rdm : A0x ≤ 0, and
therefore there exists a j such that
a′j,0x⊥ > 0. (186)
Also, since xS ∈ S, a′j,0xS ≤ 0. Since S is a linear subspace, we have a′j,0(−xS) ≤ 0 as well.
This shows that a′j,0xS = 0 (and more generally, S = x ∈ Rdm : A0x = 0).
By (186), o(1) + a′j,0x⊥ > 0 eventually. This, combined with ‖MSxnq‖≥ ε implies that
a′j,nqxnq − gj,nq > 0 (188)
eventually. This contradicts the definition of the sequence xn which requires that xn ∈poly(An, gn) for all n.
Lemma 11. Let A be a matrix. Let S be the smallest linear subspace containing C =
poly(A,0). Let J = j : aj ⊥ S. Then, S = poly(AJ ,0).
Proof of Lemma 11. First, notice that if x ∈ S, then x ⊥ aj for all j ∈ J , and therefore,
AJx = 0, so x ∈ poly(AJ ,0).
To go the other way, let x ∈ poly(AJ ,0). Lemma 9 implies that there exists an x ∈ Ssuch that a′jx < 0 for all j ∈ J c, where J c = 1, ..., dA/J . Consider y = x+Mx for M large.
We note that AJy = AJx + MAJ x ≤ 0 since x ∈ poly(AJ ,0) and x ∈ S ⊆ poly(AJ ,0). We
87
also note that for every j ∈ J c, a′jy = a′jx+Ma′jx→ −∞ as M diverges. Thus, there exists
an M large enough that y ∈ poly(A,0). This implies that y ∈ S because poly(A,0) ⊆ S.
This also implies that x = y −Mx ∈ S because S is a linear subspace.
C Supporting Materials for Section 3.2
C.1 Lemmas 12-13 and Their Proofs
Lemma 12. Let B and C be conformable matrices and d be a conformable vector. There
exists a matrix A = A(B,C) and a vector b = b(C, d) such that
δ : Cδ ≥ Bµ− d 6= ∅ ⇔ Aµ ≤ b.
Furthermore, A(B,C) = H(C)B and b(C, d) = H(C)d, where H(C) is the matrix with rows
formed by the vertices of the polyhedron h ∈ Rk : h ≥ 0, C ′h = 0,1′h = 1.
Proof of Lemma 12. By Theorem 2.7 in Gale (1960), δ : Cδ ≥ Bµ − d 6= ∅ is equivalent
to
h′(Bµ− d) ≤ 0 for all h ≥ 0, C ′h = 0. (189)
The equivalence is not affected by adding the scale normalization: 1′h = 1 to (189). Thus,
δ : Cδ ≥ Bµ− d 6= ∅ is equivalent to
h′(Bµ− d) ≤ 0 for all h ∈ H := h ≥ 0 : C ′h = 0,1′h = 1. (190)
Since the rows of H(C) are vertices, and thus elements of H, we have
δ : Cδ ≥ Bµ− d 6= ∅ ⇒ H(C)(Bµ− d) ≤ 0.
Then by the definition of A and b, we have
δ : Cδ ≥ Bµ− d 6= ∅ ⇒ Aµ ≤ b.
Conversely, since the rows of H(C) are vertices ofH, for any h ∈ H, there exists Rk-vector
c ≥ 0 such that H(C)′c = h. Thus, if Aµ ≤ b, we must have h′(Bµ−d) = c′H(C)(Bµ−d) =
c′(Aµ− b) ≤ 0. That this holds for all h ∈ H implies that δ : Cδ ≥ Bµ− d 6= ∅. Thus the
lemma is proved.
88
Lemma 13. r = rk(B′ZH0).
Proof of Lemma 13. Denote BZ , CZ , and dZ by B, C, and d. Let h′1, . . . , h′m1
be all the
rows of H(C) orthogonal to Bµ− d. Then by definition, AJ = [B′h1, . . . , B′hm1 ]′, and thus
where B = [B′1, B′2]′ is partitioned conformably with h. This implies that
rk(B′H0) = rk(Γ′B1 +B2).
To end this section, we provide the pseudo-code to calculate rk(B′H0) in Algorithm 3.
This can be plugged in Algorithm 2 in Section 3.2, replacing line 12, to compute the sCC
and the sRCC tests.
Algorithm 3: Pseudo-code for calculating rk(B′H0) when rk(B) < k.
1: %H0 = h ∈ Rk : G′h = 0, where G = (g1, . . . , gk)′ and g′j is the jth row of G.
2:
3: if rk(G) = k then4: rk(B′H0) := 05: else6: if rk(g1, . . . , grk(G)) < rk(G) then7: Rearrange rows of G so that rk(g1, . . . , grk(G)) = rk(G) holds. Rearrange the
elements of h and the rows of B accordingly.8: end if9: G1 := (g1, . . . , grk(G))
′
10: G2 := (grk(G)+1, . . . , gk)′
11: Γ := −(G1G′1)−1G1G
′2
12: rk(B′H0) := rk((Γ′, Ik−rk(G))B).13: end if
D Asymptotic Validity of the Subvector Tests
D.1 General Conditions for Asymptotic Validity
We fix a realization of Zini=1 and denote it by z. Let Fz be a collection of distributions Fz.
The following high-level assumption is sufficient for the uniform asymptotic validity of the
sCC and the sRCC tests. This assumption is the conditional version of Assumption 2.
Assumption 3. The given sequence (Fz,n, θn) : Fz,n ∈ Fz, θn ∈ Θ0(Fz,n)∞n=1 satisfies, for
every subsequence, nm, there exists a further subsequence, nq, and there exists a sequence of
positive definite dm × dm matrices, Dq, such that:
90
(a) Under the sequence Fz,nq∞q=1,
√nqD
−1/2q (mnq(θnq)− EFz,nqmnq(θnq))→d N(0,Ω), (194)
for a positive definite correlation matrix Ω, and
‖D−1/2q Σnq(θnq)D
−1/2q − Ω‖→p 0. (195)
(b) Let A(θ) and b(θ) be defined in Lemma 12. ΛqA(θnq)Dq → A0 for some dA × dm
matrix A0, and for every J ⊆ 1, ..., dA, rk(IJA(θnq)Dq) = rk(IJA0), where Λq is the
diagonal dA×dA matrix whose jth diagonal entry is one if e′jA(θnj) = 0 and ‖e′jA(θnq)Dq‖−1
otherwise.
The following corollary of Theorem 3 shows the asymptotic properties of the sRCC test.
Corollary 2. (a) Suppose Assumption 3(a) holds for all sequences (Fz,n, θn) : Fz,n ∈Fz, θn ∈ Θ0(Fz,n)∞n=1. Then,
limsupn→∞
supFz∈Fz
supθ∈Θ0(Fz)
EFz(φsRCCn (θ, α)|z) ≤ α.
Next consider a sequence (Fz,n, θn) : Fz,n ∈ Fz, θn ∈ Θ0(Fz,n)∞n=1 satisfying Assumption
3(a,b).
(b) If, along any further subsequence, for all j = 1, ..., dA,√nqe
′jΛq(A(θnq)EFz,nqmnq(θnq)−
b(θnq))→ 0, and if A0 6= 0dA×dm, then
limn→∞
EFz,nφsRCCn (θn, α) = α.
(c) If, for J ⊆ 1, ..., dA, along any further subsequence,√nqe
′jΛq(A(θnq)EFz,nqmnq(θnq)−
b(θnq))→ −∞ as q →∞, for all j /∈ J , then
limn→∞
PrFz,n(φRCCn (θn, α) 6= φRCC
n,J (θn, α)) = 0.
Remark. Corollary 2 follows from Theorem 3 because Θ0(Fz,n) has the equivalent repre-
(a) Let (Fz,n, θn) : Fz,n ∈ Fz, θn ∈ Θ0(Fz,n) be an arbitrary sequence. Let F|zi,n denote the
conditional distribution of Wi given Zi = zi implied by Fz,n. Let σ2j|z,n(θ) and D|z,n(θ) be
defined just like σ2j|z(θ) and D|z(θ) except with F|zi replaced by F|zi,n. Let Dn = D|z,n(θn).
Then Dn is positive definite for every n by Assumption 4(a).
Let Ωn = D−1/2n n−1
∑ni=1 VarF|zi,n(m(Wi, θn)|zi)D−1/2
n . Algebra shows that the square of
93
the (j, `)th element of Ωn is bounded by
2n−1
n∑i=1
EF|zi,n
[(mj(Wi, θn)
σj|z,n(θn)
)4
|zi
]+ 2n−1
n∑i=1
EF|zi,n
[(m`(Wi, θn)
σj|z,n(θn)
)4
|zi
], (198)
which is bounded by 4M0 by Assumption 4(a). Thus vec(Ωn) ∈ [0, 4M0]d2m , which is a
compact set. This implies that a subsequence nq can be found for any subsequence of nsuch that Ωnq → Ω∞. Furthermore, Assumption 4(c) implies that Ω∞ is positive definite.
It remains to verify the Lindeberg condition for the Lindeberg-Feller central limit theorem
(CLT) along the subsequence nq. Let a be an arbitrary real vector on the unit sphere in
where the first inequality holds because 1(x > ε) ≤ xε
for any x ≥ 0, the second inequality
holds because E[(X−E(X))4] ≤ 16E[X4], the third inequality holds by the Cauchy-Schwarz
inequality and ‖a‖= 1, the equality holds by Assumption 4(b), and the convergence holds
because s2q → a′Ω∞a by the definition of the subsequence nq. Therefore, the Lindeberg
condition holds and the CLT applies, proving part (a).
(b) Note that Σn(θ) is the weighted average of the standard sample variance estimator
94
within subsamples with same zi values. Thus, by standard argument, we have
EFz,n [Σn(θn)|z] =∑`∈Z
n`n
VarF|`,n(m(Wi, θn)|`) =1
n
n∑i=1
VarF|zi,n(m(Wi, θn)|zi), (202)
where the second equality holds by rearranging terms. Thus,
EFz,n [D−1/2n Σn(θn)D−1/2
n |z] = Ωn. (203)
Also by standard calculation, the (j, j′) element of D−1/2n Σn(θn)D
−1/2n has a conditional
variance given z:
1
n2
n∑i=1
VarF|zi,n
(mj(Wi, θn)mj′(Wi, θn)
σj|z,n(θn)σj′|z,n(θn)|zi)
+1
n2
n∑i=1
ω2j|zi,n(θn)ω2
j′|zi,n(θ) + ωjj′|zi,n(θn)2
nzi − 1,
(204)
where ωj|zi,n(θ) = VarF|zi,n
(mj(Wi,θ)
σj|z,n(θ)|zi)
and ωjj′|zi,n(θ) = CovF|zi,n
(mj(Wi,θ)
σj|z,n(θ),mj′ (Wi,θ)
σj′|z,n(θ)|zi)
. By
standard algebraic manipulation, we have
VarF|zi,n
(mj(Wi, θn)mj′(Wi, θn)
σj|z,n(θn)σj′|z,n(θn)|zi)≤ 1
2(Mji +Mj′i), and
ω2j|zi,n(θn)ω2
j′|zi,n(θ) + ωjj′|zi,n(θn)2 ≤Mji +Mj′i (205)
where Mji = EF|zi,n
[(mj(Wi,θn)
σj|z,n(θn)
)4
|zi]. Therefore, by Assumption 4 and the additional
assumption that nzi ≥ 2 for all i, we have that the expression in (204) is bounded by1n(M0 + 2M0), which converges to zero as n→∞. This proves part (b).
(c) First, we prove that
n−1
n∑i=1
‖zi − z`Z(i)‖2→ 0. (206)
To begin, define zi = Σ−1/2Z,n zi. By Assumption 5(b), Σ
−1/2Z,n → Σ
−1/2Z as n→∞ and this limit
is finite. Thus, Σ−1/2Z,n is uniformly bounded over all large enough n. This and Assumption
5(a) together imply that the elements of the array z1, . . . , znn≥1 are chosen from a bounded
set. Then Lemma 1 of Abadie and Imbens (2008) applies directly and implies that
n−1
n∑i=1
‖zi − z`Z(i)‖2→ 0. (207)
95
Consider the derivation
n−1
n∑i=1
‖zi − z`Z(i)‖2 = n−1
n∑i=1
(zi − z`Z(i))′ΣZ,n(zi − z`Z(i))
≤ n−1
n∑i=1
‖zi − z`Z(i)‖2eigmax(ΣZ,n)
→ 0, (208)
where eigmax(·) stands for maximum eigenvalue and the convergence holds by (207) and
Assumption 5(b). This proves (206).
Next consider an arbitrary unit vector a in Rdm , let
s2n,i(θ) = a′D−1/2
n (m(Wi, θ)−m(W`Z(i), θ))(m(Wi, θ)−m(W`Z(i), θ))′D−1/2
n a. (209)
Then a′D−1/2n Σn(θ)D
−1/2n a = 1
2n
∑ni=1 s
2n,i(θn). Since a is arbitrary, it suffices to show that
for any subsequence of n there exists a further subsequence nq such that
1
2nq
nq∑i=1
s2nq ,i(θnq)→p a
′Ω∞a. (210)
as q →∞.
Let mn,i(θ) be defined in the proof of part (a). Then
Clearly, all the summands on the right-hand side have conditional expectation zero. Now we
show that the conditional variance (which then is the conditional second moment) of each
of them converges to zero.
For the first summand on the right-hand side of (218), we have
EFz,n
(n−1
n∑i=1
(ε2i (θn)− σ2
i (θn))
)2
|z
=1
n2
n∑i=1
VarF|zi,n(ε2i (θn)|zi)
≤ 1
n2
n∑i=1
EF|zi,n [ε4i (θn)|zi]
≤ 16
n2
n∑i=1
EF|zi,n [(a′D−1/2n m(Wi, θn))4|zi]
≤ 16
n2
n∑i=1
EF|zi,n [‖D−1/2n m(Wi, θn)‖4|zi]
→ 0, (219)
where the convergence holds by Assumption 4(b). For the second summand on the right-hand
side of (218), we have
EFz,n
(n−1
n∑i=1
(ε2`Z(i)(θn)− σ2
`Z(i)(θn))
)2
|z
=
1
n2
n∑i=1
EFz,n [(ε2`Z(i)(θn)− σ2
`Z(i)(θn)))2 |z]
+2
n2
n∑i=1
n∑j=i+1
EFz,n [(ε2`Z(i)(θn)− σ2
`Z(i)(θn)) (ε2`Z(j)(θn)− σ2
`Z(j)(θn))|z]
≤ L+ 2L2
n2
n∑i=1
EF|zi,n [(ε2i (θn)− σ2
i (θn)))2 |zi]→ 0, (220)
where L is the maximum number of times a j is `Z(i) for some i. This number is bounded
by 3dz−1, which does not depend on n (see e.g. Zeger and Gersho (1994)). The convergence
holds by (219).
98
For the third summand in (218), we have
EFz,n
(n−1
n∑i=1
a′∆niεi(θn)
)2
|z
=1
n2
n∑i=1
(a′∆ni)2EF|zi,n [ε2
i (θn)|zi]
≤ MgB
n2
n∑i=1
EF|zi,n [ε2i (θn)|zi]
≤ MgB
n2
n∑i=1
(1 + EF|zi,n [ε4i (θn)|zi])
→ 0, (221)
where B is the maximum distance of two points in the sequence zini=1, which is bounded
by Assumption 5(a), the first inequality holds by Assumption 5(c), the second inequality
holds by x2 ≤ (max(1, |x|))2 ≤ max1, x4 ≤ 1 + x4 and the convergence holds by (219).
For the fourth summand in (218), we have
EFz,n
(n−1
n∑i=1
a′∆niε`Z(i)(θn)
)2
|z
=1
n2
n∑i=1
(a′∆i(θn))2EF|z`Z (i),n[ε2`Z(i)(θn)|z`Z(i)]
≤ MgB
n2
n∑i=1
EFz`Z (i),n[ε2`Z(i)(θn)|z`Z(i)]
≤ MgB
n2
n∑i=1
(1 + EF|z`Z (i),n[ε4`Z(i)(θn)|z`Z(i)])
≤ MgLB
n2
n∑i=1
(1 + EF|zi,n [ε4i (θn)|zi])
→ 0, (222)
where L is number discussed below (220).
For the fifth summand on the right-hand side of (218), we have
EFz,n
(n−1
n∑i=1
εi(θn)ε`Z(i)(θn)
)2
|z
=
1
n2
n∑i=1
EFz,n [ε2i (θn)ε2
`Z(i)(θn)|z]
+2
n2
n∑i=1
n∑j=i+1
EFz,n [εi(θn)ε`Z(i)(θn)εj(θn)ε`Z(j)(θn)|z]
99
≤ 1
n2
n∑i=1
EFz,n [ε2i (θn)ε2
`Z(i)(θn)|z] +L
n2
n∑i=1
EFz,n [ε2i (θn)ε2
`Z(i)(θn)|z]
≤ 1 + L
2n2
n∑i=1
EFz,n [ε4i (θn) + ε4
`Z(i)(θn)|z]
→ 0, (223)
where the first inequality holds because EFz,n [εi(θn)ε`Z(i)(θn)εj(θn)ε`Z(j)(θn)|z] is nonzero only
when j = `Z(i) and `Z(j) = i and this occurs at most L times for each i, the second inequality
holds by 2xy ≤ x2 + y2, and the convergence holds by (219) and the last two lines of (222).
Combining (218)-(223), we have that (217) holds, which then proves part (c).
E Numerical Details for Section 5.2
E.1 Calculation of the Identified Set
Let YU denote log(sN,i + 2/N)− log(1− sN,i + s) and let YL denote log(sN,i + s)− log(1−sN,i + 2/N). For every θ0 in the identified set, there exists a δ = (δ1, δ2)′ ∈ R2 such that for