Top Banner
Incorrect Asymptotic Size of Subsampling Procedures Based on Post-Consistent Model Selection Estimators Donald W. K. Andrews 1 Cowles Foundation for Research in Economics Yale University Patrik Guggenberger Department of Economics UCLA June 2005 Revised: December 2008 1 Corresponding author: Donald W.K. Andrews, Department of Economics, Yale University, 30 Hillhouse Ave, Rm. 17, Box 208281, New Haven, CT 06520-8281. Telephone: (203) 432- 3698. Fax: (203) 432-6167. Email: [email protected].
24

Incorrect Asymptotic Size of Subsampling Procedures Based ...

Oct 16, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Incorrect Asymptotic Size of Subsampling Procedures Based ...

Incorrect Asymptotic Size of SubsamplingProcedures Based on Post-Consistent Model

Selection Estimators

Donald W. K. Andrews1

Cowles Foundation for Research in EconomicsYale University

Patrik GuggenbergerDepartment of Economics

UCLA

June 2005Revised: December 2008

1Corresponding author: Donald W.K. Andrews, Department of Economics, Yale University,30 Hillhouse Ave, Rm. 17, Box 208281, New Haven, CT 06520-8281. Telephone: (203) 432-3698. Fax: (203) 432-6167. Email: [email protected].

Page 2: Incorrect Asymptotic Size of Subsampling Procedures Based ...

Abstract

Subsampling and the m out of n bootstrap have been suggested in the literatureas methods for carrying out inference based on post-model selection estimators andshrinkage estimators. In this paper we consider a subsampling con�dence interval (CI)that is based on an estimator that can be viewed either as a post-model-selectionestimator that employs a consistent model selection procedure or as a super-e¢ cientestimator. We show that the subsampling CI (of nominal level 1�� for any � 2 (0; 1))has asymptotic con�dence size (de�ned to be the limit of �nite-sample size) equal tozero in a very simple regular model. The same result holds for the m out of n bootstrapprovided m2=n! 0 and the observations are i.i.d. Similar zero-asymptotic-con�dence-size results hold in more complicated models that are covered by the general resultsgiven in the paper and for super-e¢ cient and shrinkage estimators that are not post-model-selection estimators. Based on these results, subsampling and the m out of nbootstrap are not recommended for obtaining inference based on post-consistent modelselection or shrinkage estimators.

Keywords: Asymptotic size, con�dence set, �nite-sample size, m out of n bootstrap,model selection, shrinkage estimator, subsample, subsampling.

JEL Classi�cation Numbers: C12, C15.

Page 3: Incorrect Asymptotic Size of Subsampling Procedures Based ...

1 Introduction

Over the years, Peter Robinson has made path breaking contributions to time se-ries analysis. Particularly noteworthy is his contribution to a class of non-regular timeseries models, viz., those with long memory. In this paper, we consider inference inthe presence of a di¤erent non-regular feature, viz., inference based on a test statisticthat has a discontinuity in its asymptotic distribution as a function of some parame-ter. In particular, we consider subsampling inference based on post-model selectiontest statistics and test statistics based on shrinkage estimators. Post-model selectiontests and con�dence intervals (CIs) are widely used in practice, with both time seriesand cross-section observations. Shrinkage estimators are important because many newdevelopments in nonparametric statistics rely on shrinkage methods. Subsampling asuper-e¢ cient estimator has been suggested in Politis, Romano, and Wolf (1999) (here-after PRW) and Lehmann and Romano (2005). Using the m out of n bootstrap for apost-model-selection estimator has been suggested by Shao (1994, 1996).

Subsampling is a very general method for carrying out inference in econometric andstatistical models, see Politis and Romano (1994). Also see Shao and Wu (1989), Wu(1990), Sherman and Carlstein (1996), and PRW.1 Minimal conditions are needed forsubsampling tests and con�dence intervals (CIs) to have desirable asymptotic prop-erties, such as asymptotically correct rejection rates and coverage probabilities understandard asymptotics based on a �xed true probability distribution for the observations,see PRW.

Recent papers by Andrews and Guggenberger (2005a, 2009a, 2010) and Guggen-berger (2008, Supplement), however, show that subsampling methods often do not yieldcorrect asymptotic size (de�ned to be the limit of �nite-sample size) when a test statis-tic has a discontinuity in its asymptotic distribution.2 The results in those papers relyon an assumption, viz., Assumption B, that is violated in the cases of post-model se-lection inference based on �consistent�model selection procedures and inference basedon shrinkage estimators, see Section 3.3.3 In this paper we provide results based ona weaker condition than Assumption B, viz., Assumption B1, and show that this as-sumption holds in the cases above.

1Shao and Wu (1989) and Wu (1990) refer to subsampling as the delete d jackknife.2The �nite-sample (or exact) size of a test is de�ned to be the maximum rejection probability of

the test under distributions in the null hypothesis. A test is said to have level � if its �nite-samplesize is � or less. The asymptotic size of a test is de�ned to be the limit of the �nite-sample size of thetest. The �nite-sample (or exact) size of a con�dence interval (or con�dence set) is de�ned to be theminimum coverage probability of the con�dence interval under distributions in the model. Analogously,a con�dence interval is said to have level 1�� if its �nite-sample size is 1�� or greater. The asymptoticsize of a con�dence interval is de�ned to be the limit of the �nite-sample size of the con�dence interval.

3A �consistent�model selection procedure, like BIC, selects the most parsimonious correct modelwith probability that goes to one as n ! 1 under a �xed probability distribution for the data.A �conservative�model selection procedure, like AIC, selects a correct model, but not necessarily themost parsimonious correct model, with probability that goes to one as n!1 under a �xed probabilitydistribution for the data.

1

Page 4: Incorrect Asymptotic Size of Subsampling Procedures Based ...

The results of this paper establish that subsampling CIs based on a post-consistentmodel selection or shrinkage estimator have asymptotic size equal to zero in a verysimple model. The general results provided in the paper also apply to more complicatedmodels.

The reason why the subsampling CI has asymptotic size equal to zero in the presentcontext is explained as follows. Consider a shrinkage estimator b�n of a parameter � thatequals zero when some preliminary estimator b�P;n satis�es jb�P;nj � n�1=4 and otherwiseequals b�P;n; where n is the sample size. A nominal 1�� subsampling CI for � is obtainedby inverting subsampling tests based on the test statistic Tn(�0) = jn1=2(b�n � �0)j fortesting H0 : � = �0 for each �0 2 R:4 This CI is of the usual form b�n� cn;b(1��)=n1=2;where cn;b(1��) is the subsampling critical value based on subsamples of size b; whereb!1 and b=n! 0 as n!1: The subsampling critical value, cn;b(1��); is the 1��sample quantile of the subsample statistics of size b: The shrinkage estimator b�n can beviewed as a post-consistent model selection estimator where model 0 takes � = 0 andmodel 1 takes � 2 R:

Suppose the true parameter is �n;h = h=n1=2 for some constant h 6= 0 for n � 1:We consider the asymptotic behavior of Tn(�n;h) and cn;b(1� �) under f�n;h : n � 1g:As is typical, suppose the preliminary estimator b�P;n satis�es n1=2(b�P;n� �n;h) = Op(1)

and b1=2(b�P;b � �n;h) = Op(1) under f�n;h : n � 1g: Under f�n;h : n � 1g; b�n = 0 withprobability that goes to one (wp! 1), because jb�P;nj = jb�P;n � �n;hj + Op(n

�1=2) =Op(n

�1=2) < n�1=4 wp! 1: In consequence, Tn(�n;h) = jn1=2�n;hj = jhj wp! 1:On the other hand, it can be shown that the asymptotic distribution of the sub-

sample statistics of size b is the same as the asymptotic distribution of the full-samplestatistic based on a full-sample of size b; i.e., Tb(�n;h): Hence, the probability limit of thesubsampling critical value is the 1�� quantile of the asymptotic distribution of Tb(�n;h)under f�n;h : n � 1g: By de�nition, Tb(�n;h) = jb1=2(b�b��n;h)j and b�b = 0 if jb�bj � b�1=4:

The latter occurs wp! 1 because jb�P;bj = jb�P;b��n;hj+Op(n�1=2) = Op(b�1=2) < b�1=4

wp! 1: Hence, b�b = 0 wp! 1 and Tb(�n;h) = jb1=2�n;hj+ op(1) = op(1) since b=n! 0:In turn, this implies that the subsampling critical value converges in probability to 0:Since the test statistic Tn(�n;h) converges in probability to jhj > 0 and the subsam-pling critical value converges in probability to 0; the subsampling test of H0 : � = �n;hrejects wp! 1 and the CI obtained by inverting the subsampling tests fails to include

4Here we consider a non-studentized test statistic because subsampling tests do not require studen-tization given that changes in the scale of the test statistic cancel with the corresponding changes inthe scale of the subsample statistics. For example, if the test statistic Tn(�0) is multiplied by � > 0;then by their de�nition the subsample statistics also are multiplied by � ; and the subsampling criticalvalue is multiplied by � ; which leaves the test unchanged. This does not mean that a subsamplingtest is the same whether or not one studentizes the test statistic. It means that studentization is notnecessary for subsampling tests to work properly when the purpose of studentization is to account forunknown scale � . In other contexts, such as with unit roots, studentization may be necessary, but inthose cases, studentization is doing more than just accounting for unknown scale � .Analogous results to those discussed here also hold for studentized statistics, as is shown in the

general results below.

2

Page 5: Incorrect Asymptotic Size of Subsampling Procedures Based ...

the true value �n;h wp! 1: This implies that the �nite-sample con�dence size of thesubsampling CI goes to zero as n!1:

In short, the argument above is: Tn(�n;h) !p jhj under �n;h = (h + o(1))=n1=2 forany h 2 R implies that Tb(�b;h)!p jhj under �b;h = (h+ o(1))=b1=2 for all h 2 R; andso, Tb(�n;h)!p 0 under �n;h = (h+ o(1))=n1=2 = (0+ o(1))=b1=2 since b=n! 0: So, thesubsample statistics are smaller than the full-sample statistic in large samples and thesubsampling test rejects wp! 1:

The paper also gives results for CIs that are based on �xed critical values (FCV)rather than subsampling critical values. The asymptotic results given here for sub-sampling tests also apply to m out of n bootstrap tests applied to i.i.d. observationsprovided m2=n ! 0: The reason is that subsampling based on subsamples of size mcan be viewed as bootstrapping without replacement, which is not too di¤erent frombootstrapping with replacement when m2=n is small.5

Related results in the literature include the following. Samworth (2003) providessimulation results and heuristics indicating that the m out of n bootstrap does not pro-vide a good approximation to the distribution of a shrinkage estimator.6 Beran (1982)shows that the (standard) bootstrap is inconsistent for the distribution of a shrinkageestimator. Kabaila (1995) shows that an FCV CI based on a super-e¢ cient estimatorhas asymptotic con�dence level equal to zero, see Leeb and Pötscher (2005) for relatedresults. Results of Leeb and Pötscher (2006) show that no uniformly consistent esti-mator of the distribution of a super-e¢ cient estimator exists. The results given in thispaper are not a special case of their result, because a uniformly consistent estimator ofthe null distribution of a test statistic is not necessary to obtain a test of level �: Forexample, Andrews and Guggenberger (2009b) provides an example which illustratesthis in the context of inference based on moment inequalities.

Subsequent to the present paper, Pötscher (2007) has provided some results concern-ing con�dence sets based on sparse estimators. Some, but not all, of the subsamplingresults of the present paper also can be established via results in Pötscher (2007). Inparticular, Pötscher�s (2007) results do not provide an expression for the asymptoticcoverage probability of a subsampling CI as a function of the localization parameter h;which is the main result of this paper, see Theorem 2 below. In addition, Pötscher�s(2007) results do not apply to (i) the speci�c example considered below with a > 0 be-cause the estimator is not a sparse estimator and (ii) models in which the subsamplingcritical value cn;b(1 � �) is not stochastically bounded uniformly in the parameter �;which needs to be established for Pötscher�s (2007) results to apply. For an example inwhich the latter condition fails, see Andrews and Guggenberger (2005a). This example

5 In an i.i.d. scenario, the distribution of a subsample of size b is the same as the conditionaldistribution of a nonparametric bootstrap sample of size m = b conditional on there being no duplicatesof observations in the bootstrap sample. If m2=n ! 0; then the probability of no duplicates goes toone as n ! 1; see PRW, p. 48. In consequence, m out of n bootstrap tests and subsampling testshave the same �rst-order asymptotic properties.

6The simulation results in Samworth (2003) are for the case where the constant a; de�ned in (2.4)below, equals :5: For smaller values of a; such as a = 0; the results are exacerbated.

3

Page 6: Incorrect Asymptotic Size of Subsampling Procedures Based ...

concerns subsampling in a linear instrumental variables regression model with possiblyweak instruments.

Other papers that consider uniformity properties of subsampling methods include(i) Andrews and Guggenberger (2005a, 2009a,b, 2010), who provide explicit expressionsfor asymptotic size, improvements to subsampling based on hybrid and size-correctionmethods, and applications to a variety of di¤erent models, (ii) Mikusheva (2007), whoshows that equal-tailed two-sided subsampling CIs do not have correct asymptotic sizein an autoregressive model with a root that may be near unity, and (iii) Romano andShaikh (2008), who provide high-level conditions under which subsampling CIs havecorrect asymptotic size and apply them to parameters de�ned by moment inequalities.

The remainder of the paper is organized as follows. Section 2 de�nes the class ofFCV and subsampling CIs that are considered in the paper and introduces the post-consistent model selection/shrinkage estimator example. Section 3 states the generalassumptions and veri�es them in the post-consistent model selection/shrinkage esti-mator example. Section 4 states the general asymptotic results and shows that theyimply that the post-consistent model selection/shrinkage estimator CI has asymptoticsize equal to zero. Section 5 provides proofs of the general results.

2 Con�dence Interval Set-up

2.1 Test Statistics

We are interested in con�dence intervals (CIs) (or con�dence regions) for a para-meter � 2 Rd in the presence of nuisance parameters. We construct such intervals byinverting a test statistic Tn(�0) for testing H0 : � = �0: The test statistic Tn(�0) maybe an LR, LM, Wald, t; or some other statistic. A test based on Tn(�0) rejects the nullhypothesis when Tn(�0) exceeds some critical value.

When Tn(�0) is a t statistic, it is de�ned as follows. Let b�n be an estimator of ascalar parameter � based on a sample of size n: Let b�n (2 R) be an estimator of thescale of b�n: For alternatives of the sort (i) H1 : � > �0; (ii) H1 : � < �0; and (iii)H1 : � 6= �0; respectively, the t statistic is de�ned as follows:

Assumption t1. (i) Tn(�0) = T �n(�0); or (ii) Tn(�0) = �T �n(�0); or (iii) Tn(�0) =jT �n(�0)j; where T �n(�0) = �n(b�n��0)=b�n and �n is some known normalization constant.In many cases, �n = n1=2:

A common case considered in the subsampling literature is when Tn(�0) is a non-studentized t statistic, see PRW. In this case, Assumption t1 and the following assump-tion hold.

Assumption t2. b�n = 1:We employ either a �xed critical value (FCV), cFix(1��); or a subsampling critical

value, cn;b(1� �); de�ned below. Let � (� Rd) denote the parameter space for �: The

4

Page 7: Incorrect Asymptotic Size of Subsampling Procedures Based ...

CI for � contains all points �0 2 � for which the test of H0 : � = �0 fails to reject thenull hypothesis:

CIn = f�0 2 � : Tn(�0) � c1��g; (2.1)

where c1�� equals cFix(1� �) or cn;b(1� �) (and c1�� may depend on �0):For example, suppose Tn(�0) is a (i) upper one-sided, (ii) lower one-sided, or (iii)

symmetric two-sided t test of nominal level � (i.e., Assumption t1(i), (ii), or (iii) holds)and c1�� does not depend on �0: Then, the corresponding CI of nominal level � isde�ned by

CIn = [b�n � ��1n b�nc1��;1);CIn = (�1;b�n + ��1n b�nc1��]; orCIn = [b�n � ��1n b�nc1��;b�n + ��1n b�nc1��]; (2.2)

respectively.We now introduce a running example that is used for illustrative purposes.

Post-Consistent Model Selection Example. We consider a subsampling CI thatis based on an estimator that can be viewed either as a post-model-selection estimatorbased on a consistent model selection procedure or as a super-e¢ cient estimator.

The model is

Xi = � + Ui; where Ui � i.i.d. N(0; 1) for i = 1; :::; n: (2.3)

For the model selection problem, model 1 takes � = 0 and model 2 takes � 2R: Model selection is carried out using a likelihood ratio test that selects model 1 ifn1=2jXnj � �n and model 2 otherwise, where �n > 0 is a critical value. If �n !1 and�n=n

1=2 ! 0 as n ! 1; the model selection procedure is consistent. (That is, when�0 = 0; model 1 is chosen with probability that goes to one as n!1; and when �0 6= 0;model 2 is chosen with probability that goes to one as n!1; where �0 is �xed and doesnot depend on n:) For the results that follow we only use the condition �n !1:When�n =

plog(n); this model selection procedure is BIC. The AIC criterion is not covered

by the results given below because it corresponds to �n =p2 9 1: (The asymptotic

size of subsampling CIs based on post-conservative model selection procedures, such asAIC, is determined in Andrews and Guggenberger (2009a). It is far from the nominallevel, but does not equal zero at least under some restrictions on a correlation matrixthat arises.) The post-model selection estimator of �0 equals zero if model 1 is selectedand Xn if model 2 is selected. This estimator is a super-e¢ cient estimator whenever�n ! 1 and �n=n1=2 ! 0: It corresponds to Hodges�super-e¢ cient estimator when�n = n1=4:

The post-model-selection/super-e¢ cient estimator, b�n; of � and the test statistic,Tn(�0); are de�ned by

b�n = �Xn if n1=2jXnj > �naXn if n1=2jXnj � �n;

where Xn = n�1nXi=1

Xi;

Tn(�0) = jn1=2(b�n � �0)j; (2.4)

5

Page 8: Incorrect Asymptotic Size of Subsampling Procedures Based ...

�n > 0; and 0 � a < 1: A post-model-selection estimator is obtained by taking a = 0:Hodges�super-e¢ cient estimator is obtained by taking �n = n1=4: For a super-e¢ cientestimator, the constant a is a tuning parameter that determines the magnitude ofshrinkage. The test statistic is a two-sided non-studentized t statistic, so that Assump-tions t1(iii) and t2 hold with �n = n1=2:

The CI for � is de�ned in (2.1) with Tn(�0) de�ned in (2.4) and c1�� de�ned below.In the case where c1�� does not depend on �0; the CI is given by the third equation in(2.2).

2.2 Critical Values and Asymptotic Size

We consider FCV and subsampling critical values. The results below allow cFix(1��) to be any constant. Often, however, one takes

cFix(1� �) = c1(1� �); (2.5)

where c1(1� �) denotes the 1� � quantile of J1 and J1 is the asymptotic null dis-tribution of Tn(�0) when the true parameter is �xed and is not a point of discontinuityof the asymptotic distribution of Tn(�0); see Section 3. In the post-consistent modelselection example this corresponds to the limit distribution of the test statistic whenthe true �0 is �xed and di¤erent from zero. For studentized tests when Assumptiont1(i), (ii), or (iii) holds, c1(1� �) typically equals z1��; z1��; or z1��=2; respectively,where z1�� denotes the 1 � � quantile of the standard normal distribution. If Tn(�0)is an LR, LM, or Wald statistic, then c1(1� �) typically equals the 1� � quantile ofa �2d distribution, denoted �

2d(1� �):

To de�ne subsampling critical values, let fbn : n � 1g be a sequence of subsamplesizes. For brevity, we often write bn as b: Let f bTn;b;j : j = 1; :::; qng be subsamplestatistics de�ned below that are based primarily on subsamples of size b rather thanthe full sample. With i.i.d. observations, there are qn = n!=((n � b)!b!) di¤erentsubsamples of size b and bTn;b;j is determined primarily by the observations in the jthsuch subsample. With time series observations, say fX1; :::; Xng; there are qn = n�b+1subsamples of b consecutive observations, e.g., Yj = fXj ; :::; Xj+b�1g; and bTn;b;j isdetermined primarily by the observations in the jth subsample Yj :

Let Ln;b(x) and cn;b(1 � �) denote the empirical distribution function and 1 � �

sample quantile, respectively, of the subsample statistics f bTn;b;j : j = 1; :::; qng: Theyare de�ned by

Ln;b(x) = q�1n

qnXj=1

1( bTn;b;j � x) for x 2 R and

cn;b(1� �) = inffx 2 R : Ln;b(x) � 1� �g; (2.6)

where 1(�) denotes the indicator function, and they may depend on �0:The subsample statistics f bTn;b;j : j = 1; :::; qng are de�ned as follows. Let fTn;b;j(�0) :

j = 1; :::; qng be subsample statistics that are de�ned just as Tn(�0) is de�ned, but based

6

Page 9: Incorrect Asymptotic Size of Subsampling Procedures Based ...

on subsamples of size b rather than the full sample. For example, suppose Assumptiont1 holds. Let (b�n;b;j ; b�n;b;j) denote the estimators (b�b; b�b) applied to the jth subsample.In this case,

(i) Tn;b;j(�0) = � b(b�n;b;j � �0)=b�n;b;j ; or(ii) Tn;b;j(�0) = �� b(b�n;b;j � �0)=b�n;b;j ; or(iii) Tn;b;j(�0) = j� b(b�n;b;j � �0)=b�n;b;j j: (2.7)

Below we use the empirical distribution of fTn;b;j(�0) : j = 1; :::; qng de�ned by

Un;b(x; �0) = q�1n

qnXj=1

1(Tn;b;j(�0) � x): (2.8)

In most cases, subsampling critical values are based on a simple adjustment to thestatistics fTn;b;j(�0) : j = 1; :::; qng; where the adjustment is designed to yield subsam-ple statistics that behave similarly under the null and the alternative hypotheses. Inparticular, f bTn;b;j : j = 1; :::; qng often are de�ned to satisfy the following condition.Assumption Sub1. bTn;b;j = Tn;b;j(b�n) for all j � qn; where b�n is an estimator of �:

In some cases, the subsample statistics are de�ned to satisfy:

Assumption Sub2. bTn;b;j = Tn;b;j(�0) for all j � qn:

Note that cn;b(1��) depends on the hypothesized parameter value �0 under AssumptionSub2, but not under Assumption Sub1. (Of course, the distribution of cn;b(1��) maydepend on the true parameter under Assumption Sub1 or Sub2.)

The distribution of the data is determined by a parameter of which � is a sub-vector. Let � denote the parameter space for : The coverage probability of the CIde�ned in (2.1) when is the true parameter vector is

P (� 2 CIn) = P (Tn(�) � c1��) = 1�RPn( ); (2.9)

where RPn( ) = P (Tn(�) > c1��): The exact (i.e., �nite-sample) and asymptoticcon�dence sizes of CIn are

ExCSn = inf 2�(1�RPn( )) and AsyCS =lim inf

n!1ExCSn; (2.10)

respectively.

Post-Consistent Model Selection Example (cont.). The subsampling criticalvalues in this example are given by cn;b(1� �) obtained from the subsample statisticsfTn;b;j(b�n) : j = 1; :::; qng de�ned in equation (iii) of (2.7) with b�n;b;j = 1: Note thatAssumption Sub1 holds. (The results given below also hold if Assumption Sub2 holds.)

7

Page 10: Incorrect Asymptotic Size of Subsampling Procedures Based ...

3 Assumptions

3.1 Motivational Example

In this section, we introduce the general assumptions under which our results hold.These assumptions allow for test statistics whose asymptotic distributions exhibit atype of discontinuity. The running example, which is a very simple post-consistentmodel selection example, is not su¢ ciently complex to illustrate the complexities thatarise in many examples. In consequence, to illustrate the types of statistics that wewant to cover, the type of discontinuity of interest, and the complexities that oftenarise, we start this section by describing a more complex example. After doing this, weintroduce the general assumptions.

The example is a simple version of the example of inference in the linear instrumen-tal variables model when instruments are potentially weak discussed in Andrews andGuggenberger (2005a) (AG hereafter). The model is given by a structural equationand a reduced-form equation

y1 = y2� + u; y2 = z� + v; (3.1)

where y1; y2; z 2 Rn and �; � 2 R are unknown parameters. Assume f(ui; vi; zi) : i � ngare i.i.d. with distribution F; where a subscript i denotes the i-th component of a vector.The goal is to test H0 : � = �0 versus H1 : � 6= �0: The test is based on the t statisticTn(�0) = jn1=2(b�n � �0)=b�nj, where b�n = (y02Pzy2)�1y02Pzy1; b�n = b�u(n�1y02Pzy2)�1=2;b�2u = (n � 1)�1(y1 � y2b�n)0(y1 � y2b�n); and Pz = zz0=z0z: De�ne nuisance parameters = ( 1; 2; 3) by

1= j(EF z2i )1=2�=�vj; 2 = �; and 3 = (F; �); where

�2v=EF v2i ; �

2u = EFu

2i ; and � = CorrF (ui; vi): (3.2)

The parameter spaces for 1 and 2 are �1 = fx 2 R : x � 0g and �2 = [�1; 1]: Thedetails for the restrictions on the parameter space �3 = �3( 1; 2) for 3 are givenin AG and are such that the following CLT holds under sequences = n for which 2 = 2;n ! h2:�

(n�1z0z)�1=2n�1=2z0u=�u(n�1z0z)�1=2n�1=2z0v=�v

�!d

� u;h2 v;h2

�� N(0;

�1 h2h2 1

�): (3.3)

In this example, the asymptotic distribution of the statistic Tn(�0) has a discontinuity at 1 = 0: Under di¤erent sequences 1 = 1;n such that 1;n ! 0; the limit distributionof Tn(�0) may be di¤erent. More precisely, denote by n;h a sequence of nuisanceparameters = n such that n

1=2 1 ! h1 and 2 ! h2 and h = (h1; h2): It is shownbelow that under n;h; the limit distribution of Tn(�0) depends on h1 and h2 and onlyon h1 and h2: As long as h1 is �nite, the sequence 1 converges to zero, yet the limitdistribution of Tn(�0) does not only depend on the limit point 0 of 1; but depends

8

Page 11: Incorrect Asymptotic Size of Subsampling Procedures Based ...

on how precisely 1 converges to zero, indexed by the convergence speed n1=2 and the

localization parameter h1. In contrast, the limit distribution of Tn(�0) only dependson the limit point h2 of 2 but not on how 2 converges to h2: In that sense, the limitdistribution is discontinuous in 1 at 0, but continuous on �2 in 2: The parameter 3does not in�uence the limit distribution of Tn(�0) by virtue of the CLT in (3.3).

If h1 <1; it is shown in AG that under n;h0@ y02Pzu=(�u�v)y02Pzy2=�

2vb�2u=�2u

1A!d

0@ �1;h�2;h�2u;h

1A =

0@ ( v;h2 + h1) u;h2( v;h2 + h1)

2

(1� h2�1;h=�2;h)2 + (1� h22)�21;h=�22;h

1A(3.4)

and thus Tn(�0) !d j�1;h=(�2;h�2u;h)1=2j. If h1 = 1; Tn(�0) !d j�1j; where �1 has astandard normal limit distribution that does not depend on h2:

3.2 Parameter Space

We now return to the general case. The parameter has up to three components: = ( 1; 2; 3) = ((�

01; �

01)0; (�02; �

02)0; 3); where � = (�

01; �

02)0; � = (�01; �

02)0; �j 2 Rdj for

j = 1; 2; and �j 2 Rsj for j = 1; 2: Points of discontinuity of the asymptotic distributionof the test statistic of interest are determined by the �rst component, 1 2 Rp; whichmay contain part of the parameter of interest, viz., �1: Through reparametrization wecan assume without loss of generality that the discontinuity occurs when one or moreelements of 1 equal zero. The value of 1 a¤ects the limit distribution of the teststatistic of interest. The parameter space for 1 is �1 � Rp:

The second component, 2 (2 Rq); of also a¤ects the limit distribution of thetest statistic, but does not a¤ect the distance of the parameter to the point ofdiscontinuity. The component 2 may contain part of the parameter of interest, �2: Inmost examples, either no parameter �1 or �2 appears (i.e., d1 = 0 or d2 = 0) and eitherno parameter �1 or �2 appears (i.e., s1 = 0 or s2 = 0): The parameter space for 2 is�2 � Rq:

The third component, 3; of does not a¤ect the limit distribution of the teststatistic. It is assumed to be an element of an arbitrary space T3 and hence can be�nite or in�nite dimensional. For example, error distributions can be included in 3.The parameter space for 3 is �3( 1; 2) (� T3); which may depend on 1 and 2:

Assumption A1. The parameter space for is

� = f( 1; 2; 3) : 1 2 �1; 2 2 �2; 3 2 �3( 1; 2)g: (3.5)

Post-Consistent Model Selection Example (cont.). In this example, no para-meters 2; 3; �2; or � appear. Assumption A1 holds with = 1 = � = �1 2 R;p = d = d1 = 1; d2 = 0; and � = �1 = � = R:

9

Page 12: Incorrect Asymptotic Size of Subsampling Procedures Based ...

3.3 Convergence Assumption

For an arbitrary distribution G; let G(�) denote the distribution function (df) ofG and let C(G) denote the continuity points of G(�): De�ne the 1 � � quantile of adistribution G by q(1��) = inffx 2 R : G(x) � 1��g: Let G(x�) = lim"&0G(x� ");where �lim"&0 " denotes the limit as " > 0 declines to zero. The distributions Jh andJh0 considered below are distributions of proper random variables that are �nite withprobability one. All limits are as n!1. For a sequence of constants f�n : n � 1g; let�n ! [�1;1; �2;1] denote that �1;1 � lim infn!1 �n � lim supn!1 �n � �2;1:

Let r > 0 denote a rate of convergence index such that when the true value of 1 satis�es n

r 1 ! h1; then the test statistic Tn(�0) has an asymptotic distributionthat depends on the localization parameter h1: In most examples, r = 1=2; but inthe unit root example considered in Andrews and Guggenberger (2009a) r = 1: In agiven example, the value of r is determined such that under sequences 1 satisfyingnr 1 ! h1 we obtain sequences of distributions that are contiguous to the distributionat a point of discontinuity of the asymptotic distribution.

Next, we de�ne the index set for the di¤erent asymptotic distributions of the teststatistic Tn(�0) of interest. Let

H = fh = (h1; h2) 2 Rp+q1 : 9 f n = ( n;1; n;2; n;3) 2 � : n � 1gsuch that nr n;1 ! h1 and n;2 ! h2g; (3.6)

where R1 = R[f�1g and Rp+q1 = R1� :::�R1 (with p+ q copies). For notationalsimplicity, in the de�nition of H and below, we write (h1; h2); rather than (h01; h

02)0;

even though h is a p+ q column vector.

De�nition of f n;h : n � 1g: Given r > 0 and h = (h1; h2) 2 H; let f n;h =( n;h;1; n;h;2; n;h;3) : n � 1g denote a sequence of parameters in � for which nr n;h;1 !h1 and n;h;2 ! h2:

The sequence f n;h : n � 1g is de�ned such that under f n;h : n � 1g; the asymptoticdistribution of Tn(�0) depends on h and only h; see Assumption B1 below. For agiven model, there is a single �xed r > 0: In addition, the limit distribution underf n;h : n � 1g of the test statistic of interest does not depend on n;h;3; so we do notmake the dependence of n;h on n;h;3 explicit.

Given any h = (h1; h2) 2 H; de�ne h0 = (0; h2): Let 0p denote a p vector of zeros.We use the following assumption.

Assumption B1. (i) For some r > 0; some h 2 H \ Rp+q such that h0 2 H; somesequence f n;h : n � 1g; and some distribution Jh; Tn(�n;h) !d Jh under f n;h :n � 1g; where n;h = ( n;h;1; n;h;2; n;h;3) = ((�0n;h;1; �0n;h;1)0; (�

0n;h;2; �

0n;h;2)

0; n;h;3) and�n;h = (�

0n;h;1; �

0n;h;2)

0; and (ii) for all sequences f n;h0 : n � 1g and some distributionJh0 ; Tn(�n;h0) !d Jh0 under f n;h0 : n � 1g; where n;h0 = ( n;h0;1; n;h0;2; n;h0;3) =(00p; (�

0n;h0;2; �

0n;h0;2)

0; n;h0;3) and �n;h0 = (00p; �

0n;h0;2)

0:

10

Page 13: Incorrect Asymptotic Size of Subsampling Procedures Based ...

Assumption B1 models discontinuity in the asymptotic distribution of Tn(�n;h) at 1 = 0 when Jh and Jh0 di¤er because the sequences f n;h : n � 1g and f n;h0 :n � 1g both converge to (0p; h2; 3) when n;h;3 ! 3 yet the asymptotic distributionsJh and Jh0 di¤er. If n;h does not depend on n (which necessarily requires h1 = 0because jjhjj < 1); Assumption B1(i) is a standard assumption in the subsamplingliterature. For example, it is imposed in the basic theorem in PRW, Thm. 2.2.1,p. 43, for subsampling with i.i.d. observations and in their Thm. 3.2.1, p. 70, forstationary strong mixing observations. If n;h does depend on n; Assumption B1(i)usually can be veri�ed using the same sort of argument as when it does not. Similarly,Assumption B1(ii) usually can be veri�ed using the same sort of argument and, hence,is not restrictive.

Assumption B1 is a weaker assumption than Assumption B, which is employedin Andrews and Guggenberger (2010) (provided there exists an h 2 H \ Rp+q suchthat h0 2 H; which holds quite generally). For example, Assumption B1 holds inthe consistent model selection example considered here, but Assumption B does notbecause the latter requires convergence of Tn(�n;h) to the same distribution J1 for allsequences f n;h : n � 1g for which h =1: The latter fails, see below.

Post-Consistent Model Selection Example (cont.). In this example, we taker = 1=2 and n;h (= �n;h) = hn�1=2; where h 2 R; in Assumption B1. We now verifyAssumption B1. For any true sequence f n : n � 1g for which n1=2 n (= n1=2�n) =O(1); we have

P n(n1=2jXnj � �n) = P n(jn

1=2(Xn � �n) + n1=2�nj � �n)

= P n(jOp(1) +O(1)j � �n)! 1 and

P n(b�n = aXn) ! 1; (3.7)

where the second equality uses the fact that n1=2(Xn � �n) � N(0; 1) and the secondconvergence result uses the de�nition of b�n in (2.4).

Hence when the true value is �n;h; b�n = aXn wp! 1; and we have wp! 1 under�n;h;

Tn(�n;h) = jn1=2(aXn � �n;h)j= jan1=2(Xn � �n;h) + (a� 1)hj� jaZ + (a� 1)hj � Jh; where Z � N(0; 1) and (3.8)

Jh(x) =

��(a�1(x+ (1� a)h))� �(a�1(�x+ (1� a)h)) if a 2 (0; 1)1(x � jhj) if a = 0;

where �(�) denotes the standard normal distribution function. Given that p = d = 1; wehave h0 = 0 and Jh0 = J0: For a = 0; J0(x) = 1(x � 0) and ch0(1��) = c0(1��) = 0:For a 2 (0; 1); we have J0(x) = �(a�1x) � �(�a�1x) and ch0(1 � �) = c0(1 � �) =az1��=2:

11

Page 14: Incorrect Asymptotic Size of Subsampling Procedures Based ...

Equation (3.8) implies that Assumption B1(i) holds. For any sequence f n;h0 : n �1g as in Assumption B1(ii), we have n1=2 n;h0 = O(1); (3.7) holds, and (3.8) holds with n;h0 in place of n;h (= �n;h): Hence, Assumption B1(ii) holds with Jh0(x) = J0(x):

Assumption B of Andrews and Guggenberger (2010) fails in this example because,as is obvious and known, the asymptotic distribution of Tn(�n;h) (when it exists) di¤ersfor a sequence f�n;h : n � 1g that converges to 0 but slowly enough that n1=2jXnj > �noccurs with probability that is bounded away from 0 and 1 from a sequence f�n;h : n �1g for which n1=2jXnj > �n occurs wp! 1: For both such sequences, h =1:

3.4 Subsampling Assumptions

To determine the asymptotic coverage probabilities of FCV CIs, the assumptionsabove are all that are needed. For subsampling CIs, we require the following additionalassumptions:7

Assumption C. (i) b!1 and (ii) b=n! 0:

Assumption D. (i) fTn;b;j(�) : j = 1; :::; qng are identically distributed under any 2 � for all n � 1; where = ( 1; 2; 3) = ((�01; �01)0; (�02; �02)0; 3) and � = (�01; �02)0;and (ii) Tn;b;j(�) and Tb(�) have the same distribution under any 2 � for all n � 1:Assumption E1. For the sequence f n;h : n � 1g in Assumption B1(i), Un;b(x; �n;h)�E nUn;b(x; �n;h)!p 0 under f n;h : n � 1g for all x 2 R:Assumption F1. For all " > 0; Jh0(ch0(1� �) + ") > 1� �; where ch0(1� �) is the1� � quantile of Jh0 and h0 is as in Assumption B1(ii).Assumption G1. For the sequence f n;h : n � 1g in Assumption B1(i), Ln;b(x) �Un;b(x; �n;h)!p 0 for all x 2 C(Jh0) under f n;h : n � 1g.

Assumptions C and D are standard in the subsampling literature, e.g., see PRW,Thm. 2.2.1, p. 43, and are not restrictive. The sequence fb = bn : n � 1g can be chosento satisfy Assumption C. Assumption D automatically holds when the observations arei.i.d. or stationary and subsamples are constructed in the usual way (described above).

Assumption E1 holds automatically for subsample statistics that are de�ned asabove when the observations are i.i.d. for each �xed 2 � (by a U-statistic inequalityof Hoe¤ding using the same argument as in PRW, p. 44). For stationary strong mixingobservations, Assumption E1 holds provided

sup 2�

� (m)! 0 as m!1; (3.9)

where f� (m) : m � 1g are the strong mixing numbers of the observations when thetrue parameter is : This follows by the same argument as given in PRW, pp. 71-72(which establishes L2 convergence using a strong mixing covariance bound).

7Assumptions that are not indexed by �1� are the same as assumptions in Andrews and Guggen-berger (2010). Assumptions that are indexed by �1�concern the same quantitites as, but are di¤erentfrom, corresponding assumptions in Andrews and Guggenberger (2010) that are not indexed by �1.�

12

Page 15: Incorrect Asymptotic Size of Subsampling Procedures Based ...

Assumption F1 is designed to avoid the requirement that Jh(x) is continuous inx because this assumption is violated in some examples, such as the consistent modelselection example, for some values of h and some values of x: Assumption F1 holds ifeither (i) Jh0(x) is continuous and strictly increasing at x = ch0(1 � �) or (ii) Jh0(x)has a jump at x = ch0(1��) with Jh0(ch0(1��)) > 1��: Condition (i) holds in mostexamples. But, if Jh0 is a pointmass, as occurs in the post-consistent model selectionestimator with constant a = 0; then condition (i) fails, but condition (ii) holds.

Assumption G1 holds automatically when f bTn;b;jg satisfy Assumption Sub2. Toverify that Assumption G1 holds when f bTn;b;jg satisfy Assumption Sub1 and Tn(�0) isa non-studentized t statistic (i.e., Assumptions t1 and t2 hold), we use the followingassumption.

Assumption H. � b=�n ! 0:

This is a standard assumption in the subsampling literature, e.g., see PRW, Thm.2.2.1, p. 43. In the leading case where �n = ns for some s > 0; Assumption H followsfrom Assumption C(ii) because � b=�n = (b=n)s ! 0:

Lemma 1 Assumptions t1, t2, Sub1, A1, B1, C, D, and H imply Assumption G1.

Comment. Lemma 1 is a special case of Lemma 5, which does not impose Assumptiont2 and, hence, covers studentized and non-studentized t statistics. Lemma 5 is statedin the Appendix for expositional convenience.

Post-Consistent Model Selection Example (cont.). We now verify AssumptionsC, D, E1-G1, and H for this example for an arbitrary choice of the parameter h: Wechoose fb = bn : n � 1g so that Assumption C holds, Assumptions D and E1 holdbecause the observations are i.i.d. for each �xed � 2 R; Assumption H holds because� b=�n = b1=2=n1=2 ! 0 by Assumption C, and Assumption G1 holds by Lemma 1using Assumption H. For a = 0; Assumption F1 holds because Jh0(x) = 1(x � 0)has a jump at x = ch0(1 � �) = 0 with Jh0(ch0(1 � �)) = 1 > 1 � �: For a 2 (0; 1);Assumption F1 holds because Jh0(x) = �(a�1x) � �(�a�1x) is strictly increasing atch0(1� �) = az1��=2:

4 Asymptotic Results

The main result of this paper concerns the asymptotic behavior of FCV and sub-sampling CIs under a sequence f n;h : n � 1g:

Theorem 2 (a) Suppose Assumption B1(i) holds. Then, P n;h(Tn(�n;h) � cFix(1 ��))! [Jh(cFix(1� �)�); Jh(cFix(1� �))]:

(b) Suppose Assumptions A1, B1, C, D, and E1-G1 hold. Then, P n;h(Tn(�n;h) �cn;b(1� �))! [Jh(ch0(1� �)�); Jh(ch0(1� �))]:

13

Page 16: Incorrect Asymptotic Size of Subsampling Procedures Based ...

Comments. 1. If Jh(ch0(1� �)) < 1� �; then part (b) shows that the subsamplingCI has asymptotic con�dence size less than its nominal level 1 � �: AsyCS < 1 � �:If Jh(ch0(1 � �)�) > 1 � �; then the subsampling CI is not asymptotically similar.Analogous statements apply to FCV CIs with Jh(cFix(1��) in place of Jh(ch0(1��):

2. If Jh(x) is continuous at x = ch0(1��), then the result of Theorem 2(b) becomesP n;h(Tn(�n;h) � cn;b(1� �))! Jh(ch0(1� �)):

3. Typically Assumption B1(i) holds for an in�nite number of values h; say h 2 H�

(� Rp): In this case, Comments 1 and 2 apply for any h 2 H�:

Post-Consistent Model Selection Example (cont.) For a = 0; Theorem 2(b)implies that the limit of the coverage probability of the subsampling CI under n;h(= �n;h) = hn�1=2 is

Jh(ch0(1� �)) = Jh(0) = 1(0 � jhj) = 0 for jhj > 0: (4.1)

Hence, for a = 0; AsyCS = 0 for the subsampling CI.For a 2 (0; 1); the limit of the coverage probability of the subsampling CI under

n;h (= �n;h) = hn�1=2 is

Jh(ch0(1� �)) = Jh(az1��=2): (4.2)

Using (3.8), for a 2 (0; 1); we have

limh!1

Jh(az1��=2) = 0: (4.3)

Hence, for a 2 (0; 1) and h su¢ ciently large, the asymptotic coverage probability ofthe symmetric two-sided subsampling CI is arbitrarily close to zero. Since h 2 R isarbitrary, this implies that AsyCS = 0 for this CI.

Figure 1 graphs the asymptotic coverage probability of the nominal 95% subsam-pling CI under n;h as a function of jhj for various values of a; namely a = 0; :25; :5;and :75: The results are obtained by simulation from (4.2) using 100,000 simulationrepetitions. Figure 1 illustrates how the degree of under-coverage of the subsamplingCI increases as a decreases and as jhj increases. In the extreme case of a = 0; theasymptotic coverage probability equals zero for all jhj > 0: For any positive valuea considered, the asymptotic coverage probability equals the nominal level :95 whenjhj = 0; decreases as jhj increases, and approaches zero as jhj ! 1:

We obtain the same result that AsyCS = 0 if one-sided CIs or equal-tailed two-sided CIs are considered. Furthermore, the size-correction methods of Andrews andGuggenberger (2005a, 2009a) do not work in this example because Assumptions LF,LS, and LH in the Appendix of Andrews and Guggenberger (2005a) fail. (For example,Assumption LF fails when a = 0 because H = R1; ch(1��) = jhj; and suph2R1 jhj =1:) Andrews and Guggenberger (2009a) does provide size-correction methods for CIsbased on post-conservative model selection estimators.

14

Page 17: Incorrect Asymptotic Size of Subsampling Procedures Based ...

5 Proofs

The following Lemmas are used in the proof of Theorem 2.

Lemma 3 Suppose (i) for some df�s Ln(�) and GL(�) on R; Ln(x) !p GL(x) for allx 2 C(GL); (ii) Tn !d GT ; where Tn is a scalar random variable and GT is somedistribution on R; and (iii) for all " > 0; GL(c1 + ") > 1 � �; where c1 is the 1 � �quantile of GL for some � 2 (0; 1). Then for cn := inffx 2 R : Ln(x) � 1 � �g; (a)cn !p c1 and (b) P (Tn � cn)! [GT (c1�); GT (c1)]:

Comments. 1. Condition (iii) holds if either GL(x) is continuous and strictly in-creasing at x = c1 or GL(x) has a jump at x = c1 with GL(c1) > 1 � � andGL(c1�) < 1� �:

2. Lemma 3 is the same as Lemma 5 of Andrews and Guggenberger (2010). Forcompleteness, we repeat its proof below.

Lemma 4 Suppose Assumptions A1, B1, C, D, and E1-G1 hold. Let f n;h : n � 1gbe as in Assumption B1(i). Then, under f n;h : n � 1g; we have

(a) E n;hUn;b(x; �n;h)! Jh0(x) for all x 2 C(Jh0);(b) Un;b(x; �n;h)!p Jh0(x) for all x 2 C(Jh0);(c) Ln;b(x)!p Jh0(x) for all x 2 C(Jh0);(d) cn;b(1� �)!p ch0(1� �); and(e) P n;h(Tn(�n;h) � cn;b(1� �))! [Jh(ch0(1� �)�); Jh(ch0(1� �))]:

Proof of Lemma 3. For " > 0 such that c1 � " 2 C(GL) \ C(GT ); we have

Ln(c1 � ")!p GL(c1 � ") < 1� � andLn(c1 + ")!p GL(c1 + ") > 1� � (5.4)

by assumptions (i) and (iii) and the fact that GL(c1 � ") < 1� � by the de�nition ofc1: This and the de�nition of cn yield

P (An("))! 1; where An(") = fc1 � " � cn � c1 + "g: (5.5)

There exists a sequence f"k > 0 : k � 1g such that "k ! 0 as k ! 1 and c1 � "k 2C(GL) \ C(GT ) for all k � 1: Hence, part (a) holds.

Let P (A;B) denote P (A \B): For part (b), using the de�nition of An("); we have

P (Tn � c1 � ";An(")) � P (Tn � cn; An(")) � P (Tn � c1 + "): (5.6)

Hence,

lim supn!1

P (Tn � cn) = lim supn!1

P (Tn � cn; An("))

� lim supn!1

P (Tn � c1 + ") = GT (c1 + "); and

lim infn!1

P (Tn � cn) = lim infn!1

P (Tn � cn; An("))

� lim infn!1

P (Tn � c1 � ";An(")) = GT (c1 � ") (5.7)

15

Page 18: Incorrect Asymptotic Size of Subsampling Procedures Based ...

using assumption (ii), c1 � " 2 C(GT ); and (5.5). Given a sequence f"k : k � 1g asabove, (5.7) establishes part (b). �

Proof of Lemma 4. First, we prove part (a). The proof is similar to that of Lemma6(ii) of Andrews and Guggenberger (2010). We have

E n;hUn;b(x; �n;h) = q�1n

qnXj=1

P n;h(Tn;b;j(�n;h) � x)

= P n;h(Tn;b;1(�n;h) � x) = P n;h(Tb(�n;h) � x); (5.8)

where the �rst equality holds by de�nition of Un;b(x; �n;h); the second equality holdsby Assumption D(i), and the last equality holds by Assumption D(ii).

We now show that P n;h(Tbn(�n;h) � x)! Jh0(x) for all x 2 C(Jh0) by showing thatany subsequence ftng of fng has a sub-subsequence fsng for which P sn;h(Tbsn (�sn;h)� x)! Jh0(x) (where �sn is a sub-vector of sn): Given any subsequence ftng; select asub-subsequence fsng such that fbsng is strictly increasing. This can be done becausebn ! 1 by Assumption C(i). Because fbsng is strictly increasing, it is a subsequenceof fng:

Below we show that Assumption B1(ii) implies that for any subsequence fung of fngand any sequence f �un = ( �un;1;

�un;2;

�un;3) 2 � : n � 1g that satis�es (i) u

rn �un;1 ! 0

and (ii) �un;2 ! h2 2 Rq; we have

P �un (Tun(��un) � y)! Jh0(y); (5.9)

for all y 2 C(Jh0) (where ��un is a sub-vector of �un):We apply this result with un = bsn ; �un = sn;h; and y = x to obtain the desired result P sn;h(Tbsn (�sn;h) � x) ! Jh0(x);where (i) and (ii) hold by the properties of f n;h : n � 1g:

For the proof of part (a), it remains to show (5.9). Because h0 2 H (by AssumptionB1(i)), by de�nition of H there exists a sequence f +k = (

+k;1;

+k;2;

+k;3) 2 � : k � 1g

such that kr +k;1 ! 0 and +k;2 ! h2 as k ! 1: De�ne a new sequence f ��k =( ��k;1;

��k;2;

��k;3) 2 � : k � 1g as follows. For n � 1; if k = un set ��k equal to �un : If

k 6= un; set ��k equal to +k : Clearly, ��k 2 � for all k � 1 and kr ��k;1 ! 0 and ��k;2 ! h2

as k ! 1: Hence, f ��k : k � 1g is of the form f n;h0 : n � 1g and Assumption B1(ii)implies that P ��k (Tk(�

��k ) � y) ! Jh0(y) for all y 2 C(Jh0) (where ���k is a sub-vector

of ��k ). Because fung is a subsequence of fkg and ��k = �un when k = un; the latterimplies that P �un (Tun(�

�un) � y)! Jh0(y); as desired.

Part (b) follows from part (a) and Assumption E1. Part (c) follows from part (b)and Assumption G1. Parts (d) and (e) are established by applying Lemma 3 withLn(x) = Ln;b(x) and Tn = Tn(�n;h) and verifying the conditions of Lemma 3 using (i)part (c), (ii) Tn(�n;h) !d Jh under f n;h : n � 1g (by Assumption B1(i)), and (iii)Assumption F1. �

Proof of Theorem 2. Part (a) holds by Assumption B1(i) and the de�nition ofconvergence in distribution by considering points of continuity of Jh(�) that are greater

16

Page 19: Incorrect Asymptotic Size of Subsampling Procedures Based ...

than cFix(1 � �) and arbitrarily close to cFix(1 � �) as well as continuity points thatare less than cFix(1��) and arbitrarily close to it. Part (b) follows from Lemma 4(e).�

We now provide su¢ cient conditions for Assumption G1 for the case when Tn isa studentized t statistic and the subsample statistics satisfy Assumption Sub1. Thisresult generalizes Lemma 1 because Assumption t2 is not imposed. The results applyto models with i.i.d., stationary and weakly dependent, or nonstationary observations.

Just as Tn;b;j(�0) is de�ned, let (b�n;b;j ; b�n;b;j) be the subsample statistics that arede�ned exactly as (b�n; b�n) are de�ned, but based on the jth subsample of size b: Inanalogy to Un;b(x; �n;h) de�ned in (2.8), we de�ne

U�n;b(x) = q�1n

qnXj=1

1(dbb�n;b;j � x) (5.10)

for a sequence of normalization constants fdn : n � 1g (for which Assumption BB1below holds). Although U�n;b(x) depends on fdn : n � 1g; we suppress the dependencefor notational simplicity.

We now state modi�ed versions of Assumptions B1, D, E1, and H that are usedwith studentized statistics when Assumption Sub1 holds.

Assumption BB1. For r; h; h0; and f n;h : n � 1g as in Assumption B1(i) andfor some distribution (Vh;Wh) on R2; (an(b�n � �n;h); dnb�n)!d (Vh;Wh) under f n;h :n � 1g; where n;h = ( n;h;1; n;h;2; n;h;3) = ((�0n;h;1; �0n;h;1)0; (�

0n;h;2; �

0n;h;2)

0; n;h;3) and�n;h = (�

0n;h;1; �

0n;h;2)

0; (ii) P n;h(b�n;b;j > 0 for all j = 1; :::; qn)! 1 under f n;h : n � 1g;and (iii) Wh(0) = 0:Assumption DD. (i) f(b�n;b;j ; b�n;b;j) : j = 1; :::; qng are identically distributed underany 2 � for all n � 1 and (ii) (b�n;b;1; b�n;b;1) and (b�b; b�b) have the same distributionunder any 2 � for all n � 1:Assumption EE1. For the sequence f n;h : n � 1g in Assumption BB1(i) and theconstants fdn : n � 1g in Assumption BB1(i), U�n;b(x) � E n;hU

�n;b(x) !p 0 under

f n;h : n � 1g for all x 2 R:Assumption HH. ab=an ! 0:

In a model with i.i.d. or stationary strong mixing observations, one often takesdn = 1 for all n; Wh to be a pointmass distribution with pointmass at the probabilitylimit of b�n; and an = n1=2:

Assumption BB1 implies that Tn(�n;h)!d Jh in Assumption B1(i) with �n = an=dn(by the continuous mapping theorem using Assumption BB1(iii)). Assumption DD im-plies Assumption D. Assumption DD is not restrictive given the standard methods ofde�ning subsample statistics. Assumption EE1 holds automatically when the obser-vations are i.i.d. for each �xed 2 � or are stationary, strong mixing, and satisfythe condition in (3.9) for each �xed 2 � provided the subsamples are constructed as

17

Page 20: Incorrect Asymptotic Size of Subsampling Procedures Based ...

described in Section 2.2 (for the same reason that Assumption E1 holds in these cases).Assumption HH holds in many examples when Assumption C holds, as is typically thecase. However, it does not hold if � is unidenti�ed when = 0 (because consistentestimation of � is not possible in this case and an = 1 in Assumption BB1(i)). Forexample, this occurs in a model with weak instruments, see Andrews and Guggenberger(2005a).

The following Lemma generalizes Lemma 1 because (i) the following Lemma doesnot impose Assumption t2 and (ii) Assumptions t1, t2, B1, D, and H imply AssumptionsBB1, DD, EE1, and HH with b�n;b;j = dn = 1:

Lemma 5 Assumptions t1, Sub1, A1, B1, BB1, C, D, DD, E1, EE1, and HH implyAssumption G1.

Comment. The proof of Lemma 5 is a variant of those of Theorems 11.3.1(i) and12.2.2(i) of PRW and Lemma 4 of Andrews and Guggenberger (2010).

Proof of Lemma 5. We have Un;b(x; �n;h) !p Jh0(x) for all x 2 C(Jh0) underf n;h : n � 1g by Lemma 4(b) (which does not require Assumptions F1 and G1 in itsproof). De�ne Rn(t) := q�1n

Pqnj=1 1(j� b(b�n � �n;h)=b�n;b;j j � t): Using

Un;b(x� t; �n;h)�Rn(t) � Ln;b(x) � Un;b(x+ t; �n;h) +Rn(t) (5.11)

for any t > 0 (which holds for all versions (i)�(iii) of Tn(�n;h) in Assumption t1), thedesired result follows once we establish that Rn(t) !p 0 under f n;hg for any �xedt > 0. By �n = an=dn, we have

j� b(b�n � �n;h)=b�n;b;j j � t i¤ (ab=an)anjb�n � �n;hj � dbb�n;b;jt (5.12)

provided b�n;b;j > 0; which holds uniformly in j = 1; :::; qn wp!1 by AssumptionBB1(ii). By Assumption BB1(i) and HH, (ab=an)anjb�n � �n;hj = op(1) under f n;hg.Therefore, for any � > 0; Rn(t) � q�1n

Pqnj=1 1(� � dbb�n;b;jt) = U�n;b(�=t); where the

inequality holds wp!1. Now, by an argument as in the proof of Lemma 4(a) and (b)(which uses Assumption EE1, but does not use Assumption G1) applied to the statisticdnb�n rather than Tn(�n;h); we have U�n;b(x)!p Wh0(x) for all x 2 C(Wh0) under f n;hg:Therefore, U�n;b(�=t) !p Wh0(�=t) for �=t 2 C(Wh0) under f n;hg: By AssumptionBB1(iii), Wh0 does not have positive mass at zero and, hence, Wh0(�=t)! 0 as � ! 0:We can therefore establish that Rn(t) !p 0 for any t > 0 by letting � go to zero suchthat �=t 2 C(Wh0). �

Acknowledgment

Andrews gratefully acknowledges the research support of the National Science Foun-dation via grant numbers SES-0417911 and SES-0751517. Guggenberger gratefully ac-knowledges research support from a faculty research grant from UCLA in 2005 and

18

Page 21: Incorrect Asymptotic Size of Subsampling Procedures Based ...

NSF grant SES-0748922. For helpful comments, we thank two referees, the AssociateEditor Miguel Delgado, Victor Chernozukhov, Benedikt Pötscher, Azeem Shaikh, andthe participants at various seminars and conferences at which the paper was presented.The results in this paper �rst appeared in Andrews and Guggenberger (2005b). Theyare not considered for publication elsewhere.

19

Page 22: Incorrect Asymptotic Size of Subsampling Procedures Based ...

References

Andrews, D. W. K., Guggenberger, P., 2005a. Applications of subsampling, hybrid,and size-correction methods. Cowles Foundation Discussion Paper No. 1608,Yale University.

Andrews, D. W. K., Guggenberger, P., 2005b. The limit of �nite-sample size and aproblem with subsampling. Cowles Foundation Discussion Paper No. 1605, YaleUniversity.

Andrews, D. W. K., Guggenberger, P., 2009a. Hybrid and size-corrected subsamplingmethods. Econometrica 77, forthcoming.

Andrews, D. W. K., Guggenberger, P., 2009b. Validity of subsampling and �Plug-inAsymptotic� inference for parameters de�ned by moment inequalities. Econo-metric Theory 25, forthcoming.

Andrews, D. W. K., Guggenberger, P., 2010. Asymptotic size and a problem withsubsampling and with the m out of n bootstrap. Econometric Theory 26, forth-coming.

Beran, R., 1982. Estimating sampling distributions: the bootstrap and competitors.Annals of Statistics 10, 212-225.

Guggenberger, P., 2008. The impact of a Hausman pretest on the size of hypothesistests. Cowles Foundation Discussion Paper No. 1651, Yale University.

Kabaila, P., 1995. The e¤ect of model selection on con�dence regions and predictionregions. Econometric Theory 11, 537-549.

Leeb, H., Pötscher, B. M., 2005. Model selection and inference: facts and �ction.Econometric Theory 21, 21-59.

Leeb, H., Pötscher, B. M., 2006. Performance limits for estimators of the risk or dis-tribution of shrinkage-type estimators, and some general lower risk-bound results.Econometric Theory 22, 69-97.

Lehmann, E. L., Romano, J. P., 2005. Testing Statistical Hypotheses. 3rd ed. Wiley,New York.

Mikusheva, A., 2007. Uniform inferences in autoregressive models. Econometrica 75,1411-1452.

Politis, D. N., Romano, J. P., 1994. Large sample con�dence regions based on sub-samples under minimal assumptions. Annals of Statistics 22, 2031-2050.

20

Page 23: Incorrect Asymptotic Size of Subsampling Procedures Based ...

Politis, D. N., Romano, J. P., Wolf, M., 1999. Subsampling. Springer, New York.

Pötscher, B. M., 2007. Con�dence sets based on sparse estimators are necessarilylarge. Unpublished manuscript, Department of Statistics, University of Vienna.

Romano, J. P., Shaikh, A. M., 2008. Inference for identi�able parameters in partiallyidenti�ed econometric models. Journal of Statistical Inference and Planning (Spe-cial Issue in Honor of T. W. Anderson) 138, 2786-2807.

Samworth, R., 2003. A note on methods of restoring consistency to the bootstrap.Biometrika 90, 985-990.

Shao, J., 1994. Bootstrap sample size in nonregular cases. Proceedings of the Ameri-can Mathematical Society 112, 1251-1262.

Shao, J., 1996. Bootstrap model selection. Journal of the American Statistical Asso-ciation 91, 655-665.

Shao, J., Wu, C. J. F., 1989. A general theory for jackknife variance estimation.Annals of Statistics 15, 1563-1579.

Sherman, M., Carlstein, E., 1996. Replicate histograms. Journal of the AmericanStatistical Association 91, 566-576.

Wu, C. F. J., 1990. On the asymptotic properties of the jackknife histogram. Annalsof Statistics 18, 1438-1452.

21

Page 24: Incorrect Asymptotic Size of Subsampling Procedures Based ...