This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
“From another angle, it is possible to argue that model selection it-self is a misguided goal. It is quite common to find that confidenceintervals from different plausible models are non-intersecting, rais-ing considerable inferential uncertainty. Fundamentally, the uncer-tainty concerning the choice of model is not reflected in conven-tional asymptotic and bootstrap confidence intervals.
–Hansen (2005)”We introduce ivshrinkwhich is part of a trio of Stata commands, regshrink andmvregshrinkwhich produce Stein-type shrinkage and model averagingestimators.
We motivate ivshrink by considering the theoretical difficulty of uniformlyconsistent post-model selection inference. Model averaging instead.
Shrinkage estimators are also well-known to have better risk properties, at thecost of potentially introducing bias.
ivshrink is work in progress, theoretically and Stata-wise. Comments welcome.
The classical linear simultaneous equations model I
The classical linear simultaneous equations model [CLSEM] which underlies theinstrumental variables estimators and inference procedures can be formulated as
Y1 = Y2β + Z1δ + ε
= Xγ + ε
for which system, the reduced form for the endogenous variables, in terms of thesystem exogenous variables Z = [Z1Z2] is
[Y1Y2] = Z [π1π2] + [ϑ1ϑ2]
In the classical case, a (matrix-)normality assumption is made on the reducedform errors
The classical linear simultaneous equations model II
We can specify the (homoskedastic) error structure of the structural error vector interms of the reduced form covariance matrix,
V (ε) = σ2ιN
where
σ2 = ω11 − 2ωT12β + βTω22β
The relevance and validity assumptions are
E(ZT [ϑ1ϑ2]
)= O
E(ZTY2
), O
The dimensions of the various objects
N : sample sizeG1 : number of endogenous explanatory variablesK1 : number of included exogenous variablesK2 : number of excluded exogenous variablesK : K1 + K2, the total number of exogenous variables in the system
The k-class estimator, γ(k) of Theil (1958) depends on the choice of a tuningparameter k ∈ R, is given as the solution to the system of linear equations[
YT2 − kϑ2
ZT1
]Y1 =
YT2 Y2 − kϑ
T2 ϑ2 YT
2 Z1ZT
1 Y2 ZT1 Z1
[β(k)δ(k)
]
Note that the k-class estimator contains the usual estimators, including OLS(k = 0), 2SLS (k = 1), LIML (k = λ0), where
λ0 = minβ
(Y1 − Y2β
)T MZ1
(Y1 − Y2β
)(Y1 − Y2β
)T MZ(Y1 − Y2β
)The k-class estimator is the basic combination estimator since it is continuous inthe parameter k, and every estimator between the OLS and the 2SLS estimator canbe obtained as some choice of k ∈ [0, 1].
The estimators above are not shrinkage estimators in the sense of Stein (1956).
Such a class of shrinkage estimators was introduced by Zellner and Vandaele(1974), and the following double g-class estimator of Ullah and Srivastava (1988)is general in this class
βU(g1, g2) =
1 −g1ϑ
T1 ϑ1
YT1 Y1 − g2ϑ
T1 ϑ1
β2SLS
where
ϑ1 = MZY1
and the optimal values of g1 and g2 lead to the estimator βU
Well known problem that post-model selection estimators do not take the modelselection uncertainty into account.
Literature recently revived in Leeb and Potscher (2005); Guggenberger (2010);Andrews and Guggenberger (2009), among others.
We define the asymptotic size of the test based on a test statistic TN under the nullas
asy. size(TN ,γ0
)= lim sup
N→∞supλ∈ΛPγ0 ,λ
[TN(γ0) > c1−α
]where α is the nominal size of the test, Tn(γ0) is the test statistic and c1−α is thecritical value of the test.
Uniformity over λ ∈ Λ which is built in to the definition of the asymptotic size ofthe test is crucial for the asymptotic size to give a good approximation for thefinite sample size.
Then, in order to be nominal size αb standard fixed critical value [FCV] test, itmust be that the test rejects the null hypothesis if
T†N (γ2SLS, γOLS,γ0) > c∞,1−αb
where
c∞,1−αb =
ξΦ,1−αb if lower one-sided testξΦ,1−αb if upper one-sided testξΦ,1−αb/2 if two-sided test
The Hausman test does not have good power to detect local deviations fromexogeneity, however, the OLS bias picks up these exogeneity deviations strongly,rejecting the second stage null and leading to over-sized test.
Simulation results confirm the theoretical findings are reported in fu Wong (1997);Guggenberger (2010).
Recent research McCloskey (2012); Cornea (2011) has suggested some ways oftrying to recover the asymptotic size without completely sacrificing power.
Kim and White (2001) provide shrinkage type estimators where a base (unbiased)estimator is shrunk towards another, possibly biased and correlated estimatorusing stochastic or non-stochastic weights.
Under a wide variety of regularity conditions, estimators for parameters γ of amodel are (jointly) asymptotically normally distributed. Consider specifically
√
N[γ2SLS,N − γ0γOLS,N − γ0
]d
[U1U2
]∼ N (ξ,σ)
Allowing for one of the estimators to be asymptotically biased leads to
ξ =
[0θ
]and allowing for full correlation between the estimators leads to
The question is – when does the JSM estimator dominate the base and the data-dependentshrinkage point in terms of asymptotic risk and what is the optimal value of c1? Thefollowing theorem, which is adapted from (Kim and White, 2001, Theorem 1)
Both the estimators above belong to the so-called regular consistent second-orderindexed [RCASOI] class of estimators proposed by Bates and White (1993), and assuch, have a valid first order representations in terms of their scores
By application of a standard CLT (Lindeberg-Feller), we have
∑N
i=1 SOLSi (β0)N∑N
i=1 S2SLSi (β0)N
d N
E
(S2SLS
i (β0))
E(SOLS
i (β0)) ,
E(S2SLS
i (β0)S2SLSi (β0)T
)E
(S2SLS
i (β0)SOLSi (β0)T
)E
(SOLS
i (β0)S2SLSi (β0)T
)E
(SOLS
i (β0)SOLSi (β0)T
)
= N (ξ,σ)
In particular, this expression allows us to compute the sample analogs of therequired covariance matrices, σ, using the plug-in principle.
We are now in a position to describe choices of the weighting matrix QN
QN =
ιG1+K1 identity matrix of dimension K1 + G1(
σ−122,N − σ
−111,N
)the non-robust Hausman variance matrix(
σ22,N + σ11,N − σ12,N − σ21,N)−1
the robust Hausman variance matrix
It is well-known that the Hausman variance matrix is rank-deficient by design.Remedies include generalized inverses, Hausman and Taylor (1981); Wu (1983).Other solutions are explored in Lutkepohl and Burda (1997); Dufour and Valery(2011). Matrix norm regularization methods are required to get the estimator tobehave well.
Using the results above, the minimum asymptotic risk estimator can becomputed. The estimator however is asymptotically biased and the asymptoticdistribution does not have a closed form expression.The bootstrap can be used to circumvent both of these difficulties. FollowingJudge and Mittelhammer (2004), the bootstrap procedure for testing nullhypotheses of the form
H0 : rγ = r
can be implemented using a double (or, nested) bootstrap, where the outerbootstrap computes the replicates
T(bo) =(r(γJS
(γ(bo)
2SLS,N , γ(bo)OLS,N ; c∗(b
o)1
)− bias
(γJS
(γ(bo)
2SLS,N , γ(bo)OLS,N ; c∗(b
o)1
)))− r
)�
(diag
(rV
(γJS
(γ(bo)
2SLS,N , γ(bo)OLS,N ; c∗(b
o)1
))r))− 1
2 ; bo = 1, . . . ,Bo
The estimates of the bias and the variance-covariance matrix are computed usingan inner bootstrap
bias(γJS
(γ(bo)
2SLS,N , γ(bo )OLS,N ; c∗(b
o)1
))=
1
Bi
Bi∑bi=1
γJS(γ(bi )
2SLS,N , γ(bi )OLS,N ; c∗(b
i)1
)− γJS
(γ(bo )
2SLS,N , γ(bo )OLS,N ; c∗(b
o)1
)
The covariance matrix is computed using the inner bootstrap resamples in theusual way.
Mittelhammer and Judge (2005) define the closely related semiparametric leastsquares estimator [SLSE], which has the estimate of the optimal nonrandomshrinkage parameter
c∗SLSE =trace
(θ
Tθ + ωN,11 − ωN,12
)trace
(ωN,11 + ωN,22 − 2ωN,12 + θ
Tθ)
Lastly, we need an estimate of the bias, θwhich is provided by
Ullah and Srivastava (1988) ullahSawa (1973) (bias) sawa biasSawa (1973) (MSE) sawa mseMorimune (1978)(bias) morimune biasMorimune (1978) (MSE) morimune mseAnderson et al. (1986) andersonZellner and Vandaele (1974) zellnerFuller (1977) fuller bias
Mittelhammer and Judge (2005) slseKim and White (2001)(random) white jscKim and White (2001) (nonrandom) white nrcKim and White (2001) (optimal) white ows
Chmelarova and Hill (2010) construct a very simple just-identified design to assess theproperties of the Hausman pre-test estimator. Their design has the simple form:
Y2iZ1iZ2iεi
= N
0,
1 0 ρ2 ρ10 1 0 0ρ2 0 1 0ρ1 0 0 1
The model for outcomes is the just-identified equation
Yi = β0 + β1Y2i + β2Z1i + εi
where the degree of endogeneity is controlled by the correlation between the singleexplanatory endogenous regressor and the structural errors, ρ1. The strength ofinstruments is controlled by the correlation between the explanatory endogenousvariable and the excluded exogenous variable, ρ2.In the simplest case, we set the (true) vector of parameters
At the cost of stepping on ivreg2’s very large shoes, ivshrink has features not directlyrelated to shrinkage estimation.
Since ivshrink is modular, it is very easy to build on additional features usingalready existing features, for example, it implements (not an exhaustive list):
Anderson-Rubin tests;Kleibergen K-tests (with & without pretesting)Moreira’s conditional likelihood ratio testS-test
Some of these features are translated from the Ox (Doornik, 2007) code of MarekJarocinski.
Anderson, T. W., Kunitomo, N., and Morimune, K. (1986). Comparing single-equation estimatorsin a simultaneous equation system. Econometric Theory, 2(1):pp. 1–32.
Andrews, D. W. and Guggenberger, P. (2009). Incorrect asymptotic size of subsampling proceduresbased on post-consistent model selection estimators. Journal of Econometrics, 152(1):19–27.
Bates, C. E. and White, H. (1993). Determination of estimators with minimum asymptoticcovariance matrices. Econometric Theory, 9(04):633–648.
Chmelarova, V. and Hill, R. C. (2010). The hausman pretest estimator. Economics Letters, 108(1):96 –99.
Cornea, A. (2011). Bootstrap for shrinkage-type estimators.
Doornik, J. (2007). Ox: An object-oriented matrix language. Timberlake Consultants Ltd, London.
Dufour, J. and Valery, P. (2011). Wald-type tests when rank conditions fail: a smooth regularizationapproach.
fu Wong, K. (1997). Effects on inference of pretesting the exogeneity of a regressor. EconomicsLetters, 56(3):267 – 271.
Fuller, W. A. (1977). Some properties of a modification of the limited information estimator.Econometrica, 45(4):939–953.
Guggenberger, P. (2010). The impact of a hausman pretest on the asymptotic size of a hypothesistest. Econometric Theory, 26(02):369–382.
Hansen, B. E. (2005). Challenges for econometric model selection. Econometric Theory, 21(01):60–68.
Hausman, J. A. and Taylor, W. E. (1981). A generalized specification test. Economics Letters, 8(3):239– 245.
Judge, G. and Mittelhammer, R. (2004). A semiparametric basis for combining estimation problemsunder quadratic loss. Journal of the American Statistical Association, 99:479–487.
Kadane, J. B. (1971). Comparison of k-class estimators when the disturbances are small.Econometrica, 39(5):pp. 723–737.
Kim, T.-H. and White, H. (2001). James-stein-type estimators in large samples with application tothe least absolute deviations estimator. Journal of the American Statistical Association, 96(454):pp.697–705.
Leeb, H. and Potscher, B. (2005). Model selection and inference: Facts and fiction. EconometricTheory, 21(1):21–59.
Lutkepohl, H. and Burda, M. M. (1997). Modified wald tests under nonregular conditions. Journalof Econometrics, 78(2):315 – 332.
McCloskey, A. (2012). Bonferroni-based size-correction for nonstandard testing problems.
Mittelhammer, R. C. and Judge, G. G. (2005). Combining estimators to improve structural modelestimation and inference under quadratic loss. Journal of Econometrics, 128(1):1 – 29.
Morimune, K. (1978). Improving the limited information maximum likelihood estimator when thedisturbances are small. Journal of the American Statistical Association, 73(364):pp. 867–871.
Mroz, T. A. (1987). The sensitivity of an empirical model of married women’s hours of work toeconomic and statistical assumptions. Econometrica, 55(4):pp. 765–799.
Nagar, A. L. (1959). The bias and moment matrix of the general k-class estimators of theparameters in simultaneous equations. Econometrica, 27(4):575–595.
Nagar, A. L. (1962). Double k-class estimators of parameters in simultaneous equations and theirsmall sample properties. International Economic Review, 3(2):pp. 168–188.
Sawa, T. (1973). Almost unbiased estimator in simultaneous equations systems. InternationalEconomic Review, 14(1):pp. 97–106.
Stein, C. (1956). Inadmissibility of the usual estimator for the mean of a multivariate normaldistribution. In Proceedings of the Third Berkeley symposium on mathematical statistics andprobability, volume 1, pages 197–206.
Theil, H. (1958). Economic forecasts and policy. North-Holland.
Ullah, A. and Srivastava, V. K. (1988). On the improved estimation of structural coefficients.Sankhy: The Indian Journal of Statistics, Series B (1960-2002), 50(1):pp. 111–118.
Wu, D.-M. (1983). A remark on a generalized specification test. Economics Letters, 11(4):365 – 370.
Zellner, A. and Vandaele, W. (1974). Studies in Bayesian econometrics and statistics, chapterBayes-Stein estimators for k-means, regression and simultaneous equation models, pages628–653. Amsterdam: North Holland.