Top Banner
Journal of Econometrics 115 (2003) 159 – 197 www.elsevier.com/locate/econbase Noninformative priors and frequentist risks of bayesian estimators of vector-autoregressive models Shawn Ni a ; , Dongchu Sun b a Department of Economics, University of Missouri-Columbia, Columbia, MO 65211, USA b Department of Statistics, University of Missouri-Columbia, Columbia, MO 65211, USA Accepted 22 January 2003 Abstract In this study, we examine posterior properties and frequentist risks of Bayesian estimators based on several noninformative priors in vector autoregressive (VAR) models. We prove exis- tence of the posterior distributions and posterior moments under a general class of priors. Using a variety of priors in this class we conduct numerical simulations of posteriors. We nd that in most examples Bayesian estimators with a shrinkage prior on the VAR coecients and the reference prior of Yang and Berger (Ann. Statist. 22 (1994) 1195) on the VAR covariance matrix dominate MLE, Bayesian estimators with the diuse prior, and Bayesian estimators with the prior used in RATS. We also examine the informative Minnesota prior and nd that its performance depends on the nature of the data sample and on the tightness of the Minnesota prior. A tightly set Minnesota prior is better when the data generating processes are similar to random walks, but the shrinkage prior or constant prior can be better otherwise. c 2003 Elsevier Science B.V. All rights reserved. JEL classication: C11; C15; C32 Keywords: VAR; Noninformative priors; Constant prior; Shrinkage prior; Reference prior; Jereys prior; Minnesota prior 1. Introduction Vector-autoregression (VAR) models initiated by the seminal papers of Sims (1972, 1980) have become indispensable for macroeconomic research. A VAR of a p Corresponding author. Tel.: +1-573-882-6878; fax: +1-573-882-2697. E-mail address: [email protected] (S. Ni). 0304-4076/03/$ - see front matter c 2003 Elsevier Science B.V. All rights reserved. doi:10.1016/S0304-4076(03)00099-X
39

Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

Aug 07, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

Journal of Econometrics 115 (2003) 159–197www.elsevier.com/locate/econbase

Noninformative priors and frequentist risks ofbayesian estimators of vector-autoregressive

modelsShawn Nia ;∗, Dongchu Sunb

aDepartment of Economics, University of Missouri-Columbia, Columbia, MO 65211, USAbDepartment of Statistics, University of Missouri-Columbia, Columbia, MO 65211, USA

Accepted 22 January 2003

Abstract

In this study, we examine posterior properties and frequentist risks of Bayesian estimatorsbased on several noninformative priors in vector autoregressive (VAR) models. We prove exis-tence of the posterior distributions and posterior moments under a general class of priors. Usinga variety of priors in this class we conduct numerical simulations of posteriors. We 3nd thatin most examples Bayesian estimators with a shrinkage prior on the VAR coe4cients and thereference prior of Yang and Berger (Ann. Statist. 22 (1994) 1195) on the VAR covariancematrix dominate MLE, Bayesian estimators with the di9use prior, and Bayesian estimators withthe prior used in RATS. We also examine the informative Minnesota prior and 3nd that itsperformance depends on the nature of the data sample and on the tightness of the Minnesotaprior. A tightly set Minnesota prior is better when the data generating processes are similar torandom walks, but the shrinkage prior or constant prior can be better otherwise.c© 2003 Elsevier Science B.V. All rights reserved.

JEL classi"cation: C11; C15; C32

Keywords: VAR; Noninformative priors; Constant prior; Shrinkage prior; Reference prior; Je9reys prior;Minnesota prior

1. Introduction

Vector-autoregression (VAR) models initiated by the seminal papers of Sims (1972,1980) have become indispensable for macroeconomic research. A VAR of a p

∗ Corresponding author. Tel.: +1-573-882-6878; fax: +1-573-882-2697.E-mail address: [email protected] (S. Ni).

0304-4076/03/$ - see front matter c© 2003 Elsevier Science B.V. All rights reserved.doi:10.1016/S0304-4076(03)00099-X

Page 2: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

160 S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197

dimensional row-random vector yt , typically has the form

yt = c +L∑i=1

yt−iBi + ”t ; (1)

where t = 1; : : : ; T , c is a 1 × p unknown vector, Bi (i = 1; : : : ; L) is an unknownp × p matrix, ”1; : : : ; ”T are independently and identically distributed (iid) normalNp(0;�) errors, with a p × p unknown covariance matrix �. We call L the lagof the VAR, and the (Lp + 1) × p unknown matrix � = (c′;B′

1; : : : ;B′L)

′ the re-gression coe4cients. The VAR above imposes no restrictions on the coe4cients �and the covariance matrix �. In applications, � and � can be estimated from timeseries macroeconomic data by ordinary least square (OLS) or maximum likelihoodestimator (MLE). Accurate estimation of 3nite sample distributions of (�;�) is im-portant for economic applications of the VAR model: In the recently developed struc-tural VAR literature numerous authors (e.g., Sims, 1986; Gordon and Leeper, 1994;Sims and Zha, 1998b; Pagan and Robertson, 1998; Leeper and Zha, 1999; Lee andNi, 2002) derive identi3cation schemes based on the estimates of �. Unfortunately,the frequentist 3nite sample distributions of OLS (or ML) estimators of � and �are unavailable. Asymptotic theory, on the other hand, may not be applicable for3nite sample inferences of VARs for two reasons. First, a typical VAR model inmacroeconomic research involves a large number of parameters, and the sample sizeof data is often not large enough to justify the use of asymptotic theory. Second,when nonlinear functions of the VAR coe4cients (such as impulse responses) areof interest, the asymptotic theory involves approximation of nonlinear functions, andthe approximation becomes worse the more nonlinear the functions there are (seeKilian, 1999). Furthermore, note that the unrestricted linear VAR above cannot modelstructural breaks and asymmetric relationship in macrovariables. To deal with thesenonlinearities, we should allow VAR parameters to be time-or state-dependent (e.g.,with Markov regime switches). Expansion of parameter space will exacerbate thelimited availability of data and make it more problematic to use asymptotictheory.An alternative to asymptotic theory is the Bayesian approach, which combines in-

formation from the sample and the prior to form a 3nite sample posterior distributionof (�;�). The present paper evaluates alternative Bayesian procedures in terms of fre-quentist risks for practitioners who are interested in 3nite sample distributions of VARparameters.The key element of Bayesian analysis is the choice of prior. The prior may be infor-

mative or noninformative. A commonly used informative prior for � is the Minnesotaprior (see Litterman, 1986), which is a multivariate normal distribution. If researchershave justi3ed beliefs about the hyper-parameters in the prior distributions, it is wise touse informative priors that reMect these beliefs. But in practice, using informative priorhas pitfalls. One problem is that prior information developed from experience may beirrelevant for a new data set. Another problem is that using informative priors makescomparing scienti3c reports more di4cult.Noninformative priors are designed to reMect the notion that a researcher has only

vague knowledge about the distribution of the parameters of interest before he

Page 3: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197 161

observes data. Alternative criteria may be used to reMect the vagueness of the re-searcher’s knowledge. A recent review of various approaches for deriving noninforma-tive priors can be found in Kass and Wasserman (1996).For the covariance matrix �, a widely employed noninformative prior is the Jef-

freys prior (Je9reys, 1967). A modi3ed version of the Je9reys prior is put to usein RATS (Regression Analysis of Time Series, a software package popular amongmacroeconomists). This prior will be called the RATS prior hereafter. The Je9reysprior is quite useful for single parameter problems but can be seriously de3cient inmultiparameter settings (see Berger and Bernardo, 1992). As alternatives, Berger andBernardo’s (1989, 1992) reference priors have been shown to be successful in var-ious statistical models, especially for iid cases. One of the objectives of the presentstudy is to examine the posterior of the VAR covariance matrix under these alternativepriors.In practice, researchers often combine separately derived priors for � and � as

priors for (�;�). The constant prior, although is used quite often for VAR coef-3cients �, is known to be inadmissible under quadratic loss for estimation of anunknown mean of vector with iid normal observations. An alternative to the con-stant prior is a “shrinkage” prior for �, which has been used in estimating the un-known normal mean in iid cases (e.g., Baranchik, 1964), and in hierarchical linearmixed models (e.g., Berger and Strawderman, 1996). The shrinkage prior is a naturalcandidate for the VAR coe4cients and will in this study be explored in the VARsetting.The fact that all of the noninformative priors of (�;�) mentioned above are im-

proper raises a question on the propriety of the posterior distribution. 1 There existsituations in which the posterior is improper even though the full conditional distri-butions necessary for Markov chain Monte Carlo (MCMC) simulations are all proper(e.g., Hobert and Casella, 1996; Sun et al., 2001). Our 3rst task in studying propertiesof VAR estimators under alternative priors is to show that the posteriors of (�;�)under these priors are proper. We establish posterior propriety for a general class ofpriors that includes all prior combinations examined in the paper. In addition we alsogive proofs for existence of posterior moments. (The usefulness of the proofs is be-yond the present paper.) Due to the fact that in most cases marginal posteriors are notavailable in closed-form, we use MCMC simulations to estimate posterior quantitiesnumerically. Besides comparing alternative noninformative priors, we also examinean informative Minnesota prior on � used in combination with the reference prioron �.The rest of the paper is organized as follows. Section 2 lays out the notation and

the MLE of the VAR model. Section 3 discusses the essential elements of Bayesiananalysis for the VAR, including priors, posteriors, loss functions, and Bayesianestimators. Section 4 presents MCMC algorithms for Bayesian computation of pos-teriors. Section 5 reports numerical results of the Bayesian computation using non-informative priors. Finally, Section 6 presents some conclusions from thiswork.

1 A prior is improper if its integral over the entire parameter space is in3nity.

Page 4: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

162 S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197

2. Notations and the MLE of the VAR

We consider the VAR model (1). Let

xt = (1; yt−1; : : : ; yt−L); (2)

Y =

y1

...

yT

;X =

x1

...

xT

; ” =

”1

...

”T

;�=

c

B1

...

BL

: (3)

Here Y and ” are T ×p matrices, � is a (1+Lp)×p matrix of unknown parameters,xt is a 1× (1+Lp) row vector, and X is a T × (1+Lp) matrix of observations. Thenwe rewrite (1) as

Y = X�+ ”: (4)

The likelihood function of (�;�) is then

L(�;�) =1

|�|T=2 etr{−12(Y − X�)�−1(Y − X�)′

}: (5)

Here and hereafter etr(A) is exp(trace(A)) of a matrix A. The 3nite sample distributionof (�;�) is the subject of interest. Note that the MLEs of � and � are

�MLE = (X′X)−1X′Y and �MLE = S(�MLE)=T; (6)

respectively, where

S(�) = (Y − X�)′(Y − X�): (7)

We assume that when T¿Lp+ 1, (X′X)−1 exists with probability one, if T¿Lp+p+ 1, S(�MLE) is positive de3nite, and the MLEs of � and � exist with probabilityone. In this paper, we take as given that T¿Lp + p + 1 so the MLEs of � and �exist.

3. Bayesian framework with noninformative priors

3.1. Priors for �

In practice, it is often convenient to consider vectorized VAR coe4cients �=vec(�),instead of �. A common expression of ignorance about � is a (Mat) constant prior.For estimating the mean of a multivariate normal distribution, some authors (e.g.,Baranchik, 1964; Berger and Strawderman, 1996) advocate the following “shrinkage”prior as an alternative to the constant prior for �:

S(�)˙ ‖�‖−(J−2); �∈RJ ; (8)

Page 5: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197 163

where J = p(Lp + 1), the dimension of �. Berger and Strawderman show that theshrinkage prior (8) dominates the constant prior for estimating the iid normal means.The intuitive justi3cation of using the shrinkage prior on � is related to the Stein(1956) e9ect, where the information about component variables can be used in such away that “borrowed strength” improves the overall joint loss of the estimator. Bergerand Strawderman make the following methodological recommendation on the choiceof noninformative priors. “Avoid using constant priors for variances or covariancematrices, or for groups of mean parameters of dimension greater than 2.” They add that“rigorous veri3cations of these recommendations would be di4cult, but the results inthis paper, together with our practical experience, suggest that they are very reasonable.”Our theoretical investigation on the posteriors is conducted in a framework that

includes both the constant and shrinkage priors. We consider the class of priors of �,

(a)(�)˙1

‖�‖a ; a¿ 0: (9)

When a¿ 0, (a)(�) has the following two-stage hierarchical structure. Let S(�|�)be the normal density of NJ (0; �IJ ),

(�|�) ∼ NJ (0; �IJ ) and assume a(�)˙1

�{a−(J−2)}=2 : (10)

Then ∫ ∞

0 S(� | �) (a)(�) d�= 1

(2 ) J=2

∫ ∞

0

1�a=2+1 exp

{− 12��′�

}d�

=�(a=2)

(2 ) J=2(�′�)a=2;

which is proportional to (9). As suggested in the introduction, informative priors aresuitable vehicles for researchers to express their knowledge on the parameters of in-terest. A popular informative prior in macroeconomics is the so-called Minnesota prioron �.

M (�)˙1

|M0|1=2 exp{−12(�− �0)

′M−10 (�− �0)

}: (11)

In this paper we compare the Minnesota prior with the constant and shrinkage priorson �. We will discuss the selection of hyper-parameters M0 and �0 later.

3.2. Priors for �

The most popular noninformative prior for � is the Je9reys prior (see Geisser, 1965;Tiao and Zellner, 1964). The Je9reys prior is derived from the “invariance principle,”meaning the prior is invariant to re-parameterization (see Zellner, 1971). The Je9reysprior is proportional to the square root of the determinant of the Fisher informationmatrix. Speci3cally, for the VAR covariance matrix, the Je9reys prior is J (�) ˙

Page 6: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

164 S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197

|�|−(p+1)=2. In RATS a modi3ed version of the Je9reys prior A(�)˙ |�|−(L+1)p=2−1

is employed.It has been noted, however, that Je9reys prior often gives unsatisfactory results for

multi-parameter problems. For example, assuming the mean and variance are inde-pendent in the Neyman–Scott (1948) problem, the Bayesian estimator of the varianceunder the Je9reys prior is inconsistent. In fact, Je9reys himself recommends againstusing his prior when it leads to improper posteriors. An intuitive explanation for thepoor performance of the Je9reys prior in multi-parameter settings is that the param-eter inter-dependence ampli3es the e9ect of the prior on each parameter. Bernardo(1979) proposes an approach for deriving a reference prior by breaking a singlemulti-parameter problem into a consecutive series of problems with fewer numbersof parameters. The reference prior is designed to extract the maximum amount of ex-pected information from the data in the sense of maximizing the di9erence (measuredby Kullback-Leibler distance) between the posterior and the prior when the number ofsamples drawn goes to in3nity. The reference priors preserve desirable features of theJe9reys prior such as the invariance property, but they often avoid paradoxical resultsproduced by Je9reys prior in multi-parameter settings. Berger and Bernardo (1989,1992) develop a procedure that leads to explicit forms of reference priors. Under thereference prior the Bayesian estimator of the variance in the Neyman–Scott problemis consistent. For other examples in which reference priors produce more desirableestimators than Je9reys priors, see Berger and Bernardo (1992), Sun and Ye (1995),and Sun and Berger (1998), among others.We agree with one of the reviewers that as the sample size grows, the number of

parameters grows and consistency may not be really reasonable. If there is only oneobservation in each normal population while both the n location parameters and thecommon variance are unknown, there are n+1 parameters. Thus, with no information,as represented by Je9reys’ prior or in a maximum likelihood approach, it is reason-able that it is di4cult to make good inferences about all the n + 1 parameters. Otherpriors add extra information which in science has to be justi3ed. Also, there are manyinformative priors, each adding its own information, that can result in proper posteriordensities for all the parameters of the Neyman–Scott problem.Another reviewer pointed out that the reference prior of Berger and Bernardo is more

informative than the Je9reys prior. The key di9erence between the reference prior andthe Je9reys prior is that unlike the latter, the reference prior allows researchers to rankparameters by their perceived importance. For any given problem the reference priordepends on the ordering of the parameters. Bernardo (1979) shows that if the posterioris asymptotically normal, then the reference prior is the Je9reys prior when all parame-ters are of importance. In estimating the variance–covariance matrix � based on an iidrandom sample from a normal population with known mean, Yang and Berger (1994)re-parameterize the matrix � as O′DO, where D is a diagonal matrix whose elementsare the eigenvalues of � (in increasing or decreasing order), and O is an orthogonalmatrix. The following reference prior is derived by giving priority to vectorized D overvectorized O : R(�)˙ {|�|�16i¡j6p(�i − �j)}−1, where �1¿�2¿ · · ·¿�p are theeigenvalues of �. Yang and Berger evaluate the reference-prior-based estimators of acovariance matrix in an iid setting.

Page 7: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197 165

Similarly to the treatment of priors on �, we consider a large class of priors for �,

(b;c)(�)˙1

|�|b=2{�16i¡j6p(�i − �j)}c ; (12)

where b∈R and c = 0; 1. Then J (�); A(�) and R(�) are special cases of (12).

3.3. Joint priors for (�;�)

The prior for (�;�) can be obtained by putting together priors for � and �. Apopular noninformative prior for multivariate regression models is called the di9useprior, which consists of a constant prior for � and the Je9reys prior for � (e.g., seeKadiyala and Karlsson, 1997). A similar prior is used in the RATS package. As willbe shown later, the e9ect of the choice of prior for � is not signi3cantly a9ected bythe prior on �. For brevity, for evaluating the performance of the Minnesota prior, itsu4ces to report results of the Minnesota prior on � in combination with the referenceprior on �. We now consider a general class of joint priors for (�;�):

(a;b;c)(�;�) = (a)(�) (b;c)(�); c = 0; 1: (13)

As special cases of (13), the prior combinations for (�;�) to be examined togetherwith Minnesota-reference prior can be summarized as follows:

Prior Notation Form (a; b; c)

Constant-Je9reys CJ(�;�)1

|�|(p+1)=2(0; p + 1; 0)

Constant-RATS CA(�;�)1

|�|(L+1)p=2+1(0; (L + 1)p + 2; 0)

Constant-reference CR(�;�)1

|�|∏16i¡j6p (�i − �j)(0; 2; 1)

Shrinkage-Je9reys SJ(�;�)1

‖�‖J−2|�|(p+1)=2(J − 2; p + 1; 0)

Shrinkage-RATS SA(�;�)1

‖�‖J−2|�|(L+1)p=2+1(J − 2; (L+1)p+2; 0)

Shrinkage-reference SR(�;�)1

‖�‖J−2|�|∏16i¡j6p (�i − �j)(J − 2; 2; 1)

Minnesota-reference MR(�;�)exp{− 1

2 (�− �0)′M−1

0 (�− �0)}|M0|1=2|�|

∏16i¡j6p(�i − �j)

The list of noninformative priors examined in the present paper is by no meansexhaustive. Other noninformative priors, such as Zellner’s (1997) Maximal Data In-formation Prior (MDIP), can be derived using approaches not discussed in this paper.A modi3ed version of Zellner’s prior in a VAR setting is studied by Deschamps(2000). Sims and Zha (1998a) propose an MCMC procedure drawing � from an

Page 8: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

166 S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197

Inverse Wishart distribution and applying priors similar to the Minnesota prior on �.The Sims–Zha approach is particularly convenient for estimation of identi3ed VARs.

3.4. Propriety of the posteriors

In this paper, we will compare various properties of estimators of the VAR param-eters (�;�) under various noninformative priors. Since all the informative priors for(�;�) listed above are improper, it is important to know if the posteriors of (�;�) ex-ist under these priors. Sun and Ni (2003) prove that the posteriors of (�;�) are properunder both the constant-Je9reys and constant-reference priors CJ(�;�) and CR(�;�).We now develop more general results on the posteriors under the prior (0; b; c)(�;�).

Theorem 1. Consider the prior (0; b; c)(�;�).

(a) If T ¿ (L+2)p−b+1, the posterior of (�;�) under the prior (0; b;0) is proper.(b) If T ¿Lp− b+ 3¿ 0, the posterior of (�;�) under the prior (0; b;1) is proper.

The proof of the theorem is given in Appendix A. The next theorem shows thatif the MLE exists, then the requirements on the sample size for existence of properposteriors are satis3ed for prior combinations involving the constant prior.

Theorem 2. If the MLE of (�;�) exists, then the posterior of (�;�) is proper under CJ(�;�); CA(�;�), and CR(�;�).

Proof. In part (a) of Theorem 1, let b=p+1 for prior CJ and b=(L+1)p+2 for prior CA. The sample size requirement under CJ(�;�) is T ¿ (L+1)p, and the requirementunder CA(�;�) is T ¿p− 1. In part (b) of Theorem 1 with b= 2, the requirementunder CR(�;�) is T ¿Lp + 1. Existence of the MLE requires T ¿ (L + 1)p + 1,which guarantees the existence of the posterior under all three prior combinations.

Theorem 2 implies that the posterior under the Minnesota-reference prior is properdue to the facts that the constant-reference prior is proper and the Minnesota prior isbounded by a constant.To show the existence of the posterior under the prior (a;b;c)(�;�) when a¿ 0, we

introduce the following conditions:

(A) J − a¿ 0,(B0) T ¿max(2p− b− 2; J − a− b+ 2),(B1) T ¿J − a− b+ 2.

Theorem 3. Consider the prior (a;b;c) when a¿ 0.

(a) If Conditions (A) and (B0) hold, the posterior of (�;�) under the prior (a;b;0)is proper.

Page 9: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197 167

(b) If Conditions (A) and (B1) hold, the posterior of (�;�) under the prior (a;b;1)is proper.

The proof of the theorem is given in Appendix B.

Theorem 4. If the MLE of (�;�) exists, then posterior of (�;�) is proper under SJ(�;�), SA(�;�), and SR(�;�).

Proof. Under prior SJ, applying part (a) of Theorem 3 with a= J − 2 and b=p+1leads to the requirement T ¿max(p− 3; p+1). Under prior SA applying part (a) ofTheorem 3 with a=J−2 and b=(L+1)p+2 leads to the requirement T ¿ 2−(L+1)p.Under prior SR, letting a = J − 2 and b = 2 in part (b) of Theorem 3 leads to therequirement T ¿ 2. These requirements are satis3ed if the MLE exists.

3.5. Existence of posterior moments

Computing Bayesian estimators of VAR models posterior moments of (�;�). Exis-tence of the posterior is necessary but not su4cient for existence of posterior moments.In the following, we derive su4cient conditions for existence of posterior moments ofcertain orders. We 3rst consider the case a= 0.

Theorem 5. Let k be 0 or 2. (a) If T ¿ (L+2)p+2h− b+3, the posterior mean of‖�‖k{tr(�2)}h=2 under the prior (0; b;0) is "nite, where h is a nonnegative integer.

(b) If T ¿Lp+2h− b+5, the posterior mean of ‖�‖k{tr(�2)}h=2 under the prior (0; b;1) is "nite.

The proof of the theorem is given in Appendix C. The results imply the existence ofthe 3rst two posterior moments of the components of �, and the hth posterior momentsof the components of �. The following theorem for the priors considered in this paperis a straightforward application of Theorem 5.

Theorem 6. Let k be 0 or 2. (a) Under CJ(�;�), if T ¿ (L + 1)p + 2 + 2h, theposterior mean of ‖�‖k{tr(�2)}h=2 is "nite.(b) Under CA(�;�), if T ¿p+ 2h+ 1, the posterior mean of ‖�‖k{tr(�2)}h=2 is

"nite.(c) Under CR(�;�), if T ¿Lp+1, the posterior mean of ‖�‖k{tr(�2)}h=2 is "nite.

Following part (c) of the theorem above, under MR(�;�), for k = 0 or 2 theposterior mean of ‖�‖k{tr(�2)}h=2 exists if T ¿Lp+ 1.Let k and h be nonnegative integers. Consider the conditions for the case a¿ 0:

(AM) J − a¿ 0 and a− k ¿ 0;(B0M) T ¿max(2p− b− 2; J + k + 2(p+ h)− a− b);(B1M) T ¿J + k + 2h− a− b+ 2.

Page 10: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

168 S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197

Theorem 7. (a) If Conditions (AM) and (B0M) hold, the posterior mean of ‖�‖k{tr(�2)}h=2 under the prior (a;b;0) is "nite.

(b) If Conditions (AM) and (B1M) hold, the posterior mean of ‖�‖k{tr(�2)}h=2under the prior (a;b;1) is "nite.

The proof of the theorem is given in Appendix D. The results imply the existenceof the kth posterior moments of the components of � and the hth posterior momentsof the components of �. Applying Theorem 7 to prior combinations that involve theshrinkage prior, we have the following result.

Theorem 8. (a) Under SJ(�;�), if T ¿max(p − 1 + 2h; 3 − p + k), the posteriormean of ‖�‖k{tr(�2)}h=2 is "nite.(b) Under SA(�;�), if T ¿max(p− Lp− 2+ 2h; k − (L+1)p+2), the posterior

mean of ‖�‖k{tr(�2)}h=2 is "nite.(c) Under SR(�;�), if T ¿ 2 + k, the posterior mean of ‖�‖k{tr(�2)}h=2 is "nite.

From Theorems 6 and 8 we conclude that the requirements on the sample size forexistence of posterior moments are easily satis3ed in practical cases.

3.6. Conditional posterior distributions

The posteriors of (�;�) are not available in closed-form for most prior combinations.Recent years have witnessed vast progress in numerical posterior simulations. For somerecent examples of Bayesian computations in econometrics, see Geweke (1996, 1999),Chib (1998), Chib and Hamilton (2000), and the references therein. In this study, weuse Gibbs sampling MCMC methods to sample from the posteriors (cf. Gelfand andSmith, 1990). The 3rst step of the MCMC computation is to 3nd the full conditionaldistributions of (�;�). We will make use of the following results.

Fact 1. Consider the constant-Je8reys prior for (�;�). The conditional posterior of� given � is MVN (�MLE;�⊗ (X ′X)−1) and the marginal posterior of � is InverseWishart (S(�MLE); T − Lp − 1) Here �MLE is de"ned as vectorized �MLE and ⊗ isKronecker product.

Proof. This follows from the proof of Theorem 1. (We followed the notation of theInverse Wishart distribution of Anderson, 1984, p. 268).

Fact 2. Consider the constant-RATS prior for (�;�). The conditional posterior of �given � is MVN (�MLE;� ⊗ (X ′X)−1) and the marginal posterior of � is InverseWishart (S(�MLE); T ).

Fact 3. Consider the constant-reference prior.(a) The conditional distribution of � given (�;Y) is

(� |�;Y) ∼ MVN (�MLE;�⊗ (X ′X)−1): (14)

Page 11: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197 169

(b) The conditional density of � given (�;Y) is

(� |�;Y)˙etr{− 1

2 �−1S(�)}

|�|T=2+1∏

16i¡j6p(�i − �j); (15)

where S(�) is de"ned by (7).

Proof. This follows from standard computation.

The hierarchical structure (10) suggests a nice computational formula. For example,the shrinkage prior S(�) is a special case of (10) with a = J − 2. In this case, wehave

(� | �) ∼ NJ (0; �IJ ) and (�)˙ 1:

Instead of simulating from the conditional distribution of � and � within each Gibbscycle, we use � as a latent variable and to simulate from �, �, and � based on thefollowing fact.

Fact 4. Consider the shrinkage-reference prior.

(a) The conditional density of � given (�; �;Y) is in (15).(b) The conditional distribution of �=vec(�) given (�;�;Y) is MVNJ (�S ;VS), where

�S = �(�⊗ (X ′X)−1 + �IJ )−1�MLE; (16)

VS =(�−1 ⊗ X ′X +

1�IJ

)−1

: (17)

(c) The conditional distribution of � given (�;�;Y) is Inverse Gamma (J=2 −1; 12 �

′�).

Proof. The proof of (b) is similar to Example 9 of Berger (1984). The others aresimple.

Since the Minnesota prior of � is independent of �, the conditional posterior den-sity under the Minnesota-reference prior for � given (�;Y) is given by (15). Theconditional posterior density of � given (�;Y) is

(� |�;Y)˙ exp{−12(�− �0)

′M−10 (�− �0)

− 12(�− �MLE)′[�−1 ⊗ (X′X)](�− �MLE)

}: (18)

Thus we have the following result.

Page 12: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

170 S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197

Fact 5. Consider the Minnesota-reference prior. The conditional density of � given(�;Y) is MVNJ (�M ;VM ), where

�M = �MLE + (M−10 + �−1 ⊗ (X ′X)−1)−1M−1

0 (�0 − �MLE); (19)

VM = (M−10 + �−1 ⊗ (X ′X))−1: (20)

The hyper-parameter �0 (i.e. vec(�0)) is de3ned by letting the mean of B1 be theidentity matrix and the mean of the other elements be zero. The elements of M0 aregiven as b1=k for parameters of VAR variables of their own kth lag; (b1b2=k) ( i= j)1=2

for parameters of the kth lag of the jth variable in the ith equation, j �= i, ( i is thevariance of the residuals of ith VAR equation estimated via OLS); and b3 for intercepts.There is no unique way of choosing the hyper-parameters. Our speci3cation of the M0

matrix closely follows that of Kadiyala and Karlsson (1997), which is slightly di9erentfrom the form in the RATS manual and Hamilton’s book (1994, pp. 361–362). In ournumerical examples, we experiment with alternative settings of the Minnesota priorwith di9erent variance parameters b1 and b3. Following convention we choose thehyper-parameter b2 to be 0.5. We also set b3 = 1:0. We control the “tightness” ofthe Minnesota prior by adjusting the values of parameter b1. A tight version of theMinnesota prior is de3ned by b1 = 0:22, and a loose version sets b1 = 0:92. Herethe words “tight” or “loose” are used in relative terms. One can certainly argue thatb1 = 0:92 represents a tight prior compared to the case b1 = 102.

3.7. Loss functions and Bayesian estimators

A Bayesian estimator of (�;�) depends on the data generating model, the prior, andthe loss function. We consider a pseudo entropy loss function for � and a quadraticloss function for �,

L1(�;�) = tr(�−1�)− log|�−1�| − p; (21)

L2(�;�) = tr{(�−�)′G(�−�)}; (22)

where G is a constant weighting matrix, and p is the number of variables in the VAR.If the weighting matrix G is the identity matrix, the loss L2 is simply the sum ofsquared errors of all elements of �,

1+Lp∑i=1

p∑j=1

(!i; j − !i;j)2: (23)

The loss L2 can be decomposed as L21 + L22, where the loss associated with theintercept terms is

L21 =p∑j=1

(!1; j − !1; j)2; (24)

Page 13: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197 171

and the loss associated with terms other than the intercepts is

L22 =1+Lp∑i=2

p∑j=1

(!i; j − !i;j)2: (25)

The loss function for (�;�) contains a part measuring the loss associated with thecovariance matrix (L1) and a part measuring the loss pertaining to the VAR coe4cients(L2). It is well known that the Bayes estimator under the square error loss is theposterior mean. One can also verify that the posterior mean is the Bayesian estimatorunder loss function L1. Thus we have the following result.

Lemma 1. Under the loss L1 + L2, the generalized Bayesian estimator of (�;�) is

�= E(� |Y); (26)

�= E(� |Y): (27)

4. Algorithms for simulating from posterior of (�;�)

The algorithms for MCMC computations of posterior distributions of (�;�) dependon the priors. For brevity we only outline the algorithms with constant prior on � andthe Je9reys and reference priors on �.Following Fact 1, we use an MC algorithm to sample from the joint posterior dis-

tribution (�;�). Suppose at cycle k we have (�k−1;�k−1) sampled from cycle k − 1.The following algorithm is used for computing the posterior under the constant-Je9reysprior.

Algorithm CJ:Step 1: Simulate 8 ∼ IW (S(�MLE); T − Lp− 1) and let �k =8.Step 2: Simulate �k from MVN (�MLE;�k ⊗ (X′X)−1). Stop if k + 1 is larger than

a pre-set number M , otherwise replace k by k + 1 and go to Step 1.

The algorithm using the constant-RATS prior is similar to the one above, with theexception that in Step 1 the distribution of the Inverse Wishart has di9erent degreesof freedom: 8 ∼ IW (S(�); T ).

It is much more di4cult to simulate from the conditional distribution of � under thereference prior. We adopt a hit-and-run algorithm used in Yang and Berger (1994).In implementing the algorithm, we consider a one-to-one transformation of �, namely�∗ = log(�) or �= exp(�∗) in the sense that

�=∞∑j=0

�∗j

j!:

It can be shown that the conditional posterior density of �∗ given (�;Y) is then

(�∗|�;Y) = (�∗|S(�))˙etr{−(T=2)|9∗| − 1

2 (exp�∗)−1S(�)}∏

i¡j (�∗i − �∗j )

; (28)

Page 14: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

172 S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197

where �∗ =O′9∗O;O is an orthogonal matrix, and 9∗ = diag(�∗1 ; : : : ; �∗p) with �∗1 ¿

· · ·¿�∗p. Note that exp(�∗) =O′exp(9∗)O.To simulate �∗ from (28), we use the following algorithm. Assume we have a Gibbs

sample (�k−1;�k−1).For Cycle k:Step 1: Simulate �k ∼ MVN (�MLE;�k−1 ⊗ (X′X)−1) and get �k .Step 2: Calculate Sk = S(�k) = (Y − X�k)′(Y − X�k).Step 3: Decompose �k−1 =O′9O, where 9= diag(�1; : : : ; �p), �1¿�2¿ · · ·¿�p,

and O′O= I. Let �∗i = log(�i), 9∗ = diag(�∗1 ; : : : ; �

∗p), and �

∗k−1 =O9

∗O′.Step 4: Select a random symmetric p × p matrix V, with elements vij = zij=√∑

l6m z2lm, where zij ∼ N(0; 1), 16 i6 j6p. The other elements of V are de-

3ned by symmetry.Step 5: Generate t ∼ N(0; 1) and set W = �∗

k−1 + tV. Decompose W = Q′C∗Q,where C∗ = diag(c∗1 ; : : : ; c

∗p), c

∗1 ¿c∗2 ¿ · · ·¿c∗p, and Q

′Q= I. Compute

)k = log( (exp(W) |Sk))− log( (exp(�∗k−1)) |Sk)

=T2

p∑i=1

(�∗i − c∗i ) +12tr{((exp�∗

k−1)−1 − (expW)−1)Sk}

+∑i¡j

log(�∗i − �∗j )−∑i¡j

log(c∗i − c∗j ):

Step 6: Generate u ∼ Unif(0; 1).If u6min(1; exp()k)), let �∗

k =W and �k = QCQ′, where C = diag(ec1 ; : : : ; ecp);otherwise, let �∗

k =�∗k−1 and �k =�k−1. Stop if k +1 is larger than a pre-set number

M ; otherwise replace k by k + 1 and go to Step 1.

When the shrinkage prior is used to replace the constant prior for �, the algo-rithms for Bayesian computation need to be modi3ed by adding one step for drawing� using Fact 4. In cycle k, �k is drawn in two-steps. First, parameter �k is drawnfrom an Inverse Gamma distribution, which depends on �k−1. Then �k is drawnfrom a multivariate normal distribution that depends on �k , �k , and the data sam-ple. The MCMC algorithm for numerical simulation of the posterior of (�;�) underthe Minnesota-reference prior is based on the conditional posteriors in Fact 3 and Fact5. The algorithm is quite similar to the algorithm used for drawing the posterior underthe constant-reference prior combination, with a modi3cation in the conditional density (� |�;Y).

5. MCMC simulations

In the following we use numerical examples to evaluate the posteriors of competingestimators. We 3rst generate N=1000 data samples from VARs with known parameters.Then for each generated data set we compute the Bayesian estimates under alterna-tive priors via algorithms described in the previous section. The MCMC computations

Page 15: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197 173

for eight prior combinations on (�;�) are labeled as CA (Constant-RATS priors),CJ (Constant-Je9reys priors), CR (Constant-Yang and Berger’s Reference priors), SA(Shrinkage-RATS priors), SJ (Shrinkage-Je9reys priors), SR (Shrinkage-Reference pri-ors), TMR (Tight Minnesota-Reference priors), and LMR (Loose Minnesota-Referencepriors).The length of the Markov Chain is set at M=10; 500, with the 3rst 500 cycles serving

as burn-in runs. We choose a variety of data-generating VARs. Using the Monte Carloresults, we evaluate the Bayesian estimators under competing priors in terms of thefrequentist risks, impulse responses, and the mean-squared errors of forecast (MSEF).We also plot frequentist distributions of some elements of �. We now discuss thecriteria of evaluation in more detail.

5.1. Criteria for evaluations of Bayesian VAR estimates

(a) Average frequentist losses. The most important criterion of evaluation is thefrequentist risks of MLE and Bayesian estimators with various prior combinations on� and �. For loss function Li, we denote the frequentist risk as Ri(i = 1; 2). We alsodenote the estimates of � and � from the nth data set as �n and �n. The frequentistrisks are estimated by averaging the losses over N samples:

R1(�) =1N

N∑n=1

L1(�(n);�); and R2(�) =1N

N∑n=1

L2(�(n);�):

(b) Impulse response functions. The matrix of impulse response to orthogonalizedshocks occurred i periods earlier is denoted by Zi. By de3nition, impulse responses arenonlinear functions of (�;�). The nonlinearity makes it di4cult to derive frequentistinferences but does not pose di4culties for Bayesian simulations as long as (�;�) canbe simulated. For the nth data set generated in the experiment, we denote the impulseresponse matrix on the ith step after the shock as Z(n)

i . The accuracy in estimationof the impulse responses (with forecasting horizon H) is measured by the frequentistaverage of sum of squared errors

RImp =1

Np2H

N∑n=1

tr

{H∑i=1

(Zi − Z(n)i )′(Zi − Z(n)

i )

}: (29)

(c) Improvement in mean squared errors of forecast compared to the MLE. Be-sides risks, one frequentist criterion for evaluating estimators is the forecasting errorattributable to the deviation of estimates from the true parameters. The h-step-aheadforecasting error at period T can be decomposed into two orthogonal parts:

yT+h − yT+h | �= (yT+h − yT+h |�) + (yT+h |�− yT+h | �);

where yT+h |� and yT+h | � are the forecasts conditional on observations up to periodT . They can be calculated from the VAR by setting the error term to zero after periodT . The 3rst term in the right-hand side is the sampling error. The second term isthe forecasting error attributable to the deviation of estimates from the true parameters.Since the true parameters are known, the second term can be calculated with competing

Page 16: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

174 S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197

estimators, and the MSEF of the second term can be compared. The MSEF is related tothe frequentist loss in L2 because it is the expectation of weighted quadratic estimationerrors of �. The frequentist average of the one-step-ahead MSEF for N samples is

E(�− �)′x′TxT (�− �) =1N

N∑n=1

(�− �(n))′x(n)′

T x(n)T (�− �(n)):

5.2. Simulation results

Numerous factors in the model design inMuence the performance of Bayesian esti-mators. For a given model, a larger sample size (T ) makes smaller the e9ect of priorchoice on the estimates. For a given sample size, a larger number of variables (p)included in the VAR or a longer lag length (L) makes the prior choice more impor-tant. Numerical results are presented to illustrate the e9ects of the sample size anddimension of the model. We will denote the VAR model (4) as VAR(T; p; L;�;�).

The relative performance of a prior also depends on the data generating process. Thetypes of models we choose have some characteristics commonly observed in macroeco-nomic time series. We 3rst consider bivariate VARs with one lag. This setting involvesthe least number of parameters and allows for more experiments. We employ the co-variance matrix � with di9erent correlations and di9erent types of VAR coe4cientmatrix �. We consider three types of data generating models for VARs with one lag:random walks with uncorrelated errors (�= Ip), Granger-causal chains with correlatederrors, and VARs with relatively large o9-diagonal elements in the lag coe4cient ma-trix. In addition to the one-lag VARs, we also consider two-lag VARs that are closeto being I(2) processes (i.e., the 3rst di9erence in the time series are random walks).

Example 1. We consider VAR(T=20; p=2; L=1;�;�), where � is given by (3) withc=(0; 0), B1=I2, and �=I2. This model serves as the benchmark. The assumption thatthe covariance matrix � is the identity matrix means that we treat the VAR disturbancesas structural shocks. The assumption that the VAR lag coe4cient matrix is also theidentity means that the VAR consists of independent random walk variables.

For 1,000 replications with Markov Chain length 10,500, the MCMC computationstake about 2 h total on a 1:7 GHz Pentium4 PC for all eight prior combinations. Simu-lation results are little changed when the Markov Chain length is reduced to 5,000 andthe number of generated samples is reduced to 500, suggesting that the Markov chainsconverge rather quickly. The Metropolis–Hastings procedure is e4cient for simulationof the � matrix under the Yang–Berger’s reference prior, with acceptance rates around58 percent. The frequentist risks of MLE and Bayesian estimates under the test priorsare reported in Table 1. The 3rst two columns report the average and standard errorsof losses associated with � and � over the 1,000 generated samples. They show thatthe average losses associated with � are not inMuenced much by the prior on �. Forestimating �, the reference prior reduces frequentist risks by more than two third ofthat of the MLE and by about one-half to two-third compared with the Bayesian es-timates under RATS and Je9reys priors. For estimating �, the tight Minnesota prior

Page 17: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197 175

Table 1Example 1

R1(�) R2(�) R22 RImp Improvement in forecast(W1; W2)

MLE 0.526(0.519) 5.491(8.794) 0.393(0.288) 0.610CA 0.353(0.382) 5.491(8.787) 0.393(0.288) 0.611 (−0:02; 0:01)CJ 0.244(0.257) 5.490(8.793) 0.393(0.288) 0.616 (−0:00;−0:00)CR 0.167(0.208) 5.493(8.804) 0.393(0.288) 0.613 (0:00; 0:00)SA 0.353(0.382) 2.509(4.231) 0.301(0.222) 0.581 (20:76; 27:08)SJ 0.244(0.258) 2.216(3.647) 0.293(0.215) 0.578 (21:52; 28:54)SR 0.161(0.202) 2.005(2.879) 0.287(0.210) 0.575 (22:65; 29:71)TMR 0.136(0.169) 0.555(0.428) 0.053(0.027) 0.456 (78:44; 79:24)LMR 0.157(0.199) 1.199(0.763) 0.222(0.173) 0.569 (36:65; 40:73)

R1(�) is the estimated frequentist risk of the Bayesian estimator for � under loss L1 (with frequentiststandard errors of the losses in parentheses).R2(�) is the estimated frequentist risk of the Bayesian estimator for � under loss L2 (with frequentiststandard errors of the losses in parentheses).R22 is part of the R2 associated with the nonintercept elements of � (with frequentist standard errors of thelosses in parentheses).RImp is the frequentist average of sum of estimation errors of the impulse responses, as de3ned by (29) inthe text.Improvement in forecast: Percentage improvement in mean square of one-step forecast errors attributableto deviation of estimates for � from the true parameter relative to the MLE by Bayesian estimators. Wi ,the ith element in the bracket, corresponds to percentage improvement of the ith variable by the Bayesianestimators.Bayesian estimators based on competing priors are denoted asCA: Bayesian estimator with constant-RATS prior;CJ: Bayesian estimator with constant-Je9reys prior;CR: Bayesian estimator with constant-reference prior;SA: Bayesian estimator with shrinkage-RATS prior;SJ: Bayesian estimator with shrinkage-Je9reys prior;SR: Bayesian estimator with shrinkage-reference prior;TMR: Bayesian estimator with the tight Minnesota-reference prior de3ned in the text;LMR: Bayesian estimator with the loose Minnesota-reference prior de3ned in the text.

does best and the loose Minnesota prior second best. This not surprising given the factthat the data generating � is the mean of the Minnesota prior.The third column of Table 1 reports the frequentist average of L22 losses associated

with elements of the VAR lag coe4cients B1. The di9erence between the second andthird column is the average L21 loss associated with the constant terms in the VAR.The average losses in the third column are much smaller than the second column,suggesting that most of the L2 losses are due to L21. By de3nition, the intercept termsin the VAR do not a9ect the impulse responses. It is therefore reasonable that di9erentrows of the fourth column of Table 1, which report the averages of mean squarederrors of elements of impulse responses, are fairly similar under di9erent priors.As the frequentist average losses in Table 1 show, for the MLE and the constant-

prior-based Bayesian estimators, the estimation error for the intercept term is quitelarge compared to the error in the lag coe4cients. This results in relatively large

Page 18: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

176 S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197

Table 2Example 2

R1(�) R2(�) R22 RImp Improvement in forecast(W1; W2)

MLE 0.361(0.429) 4.583(8.133) 0.292(0.402) 0.482CA 0.250(0.312) 4.584(8.136) 0.292(0.402) 0.494 (0:05; 0:00)CJ 0.202(0.214) 4.584(8.124) 0.292(0.402) 0.521 (−0:03;−0:09)CR 0.191(0.185) 4.583(8.140) 0.292(0.402) 0.521 (0:16;−0:04)SA 0.250(0.311) 1.577(1.871) 0.236(0.309) 0.436 (20:77; 23:90)SJ 0.202(0.213) 1.454(1.561) 0.231(0.296) 0.476 (22:23; 24:59)SR 0.187(0.183) 1.363(1.372) 0.230(0.298) 0.464 (23:55; 24:59)TMR 0.195(0.142) 0.720(0.641) 0.096(0.033) 0.459 (6:80; 53:39)LMR 0.186(0.180) 1.031(0.801) 0.175(0.228) 0.470 (31:70; 32:85)

For an explanation to notations, see footnote of Table 1.

improvements of shrinkage-prior-based estimators over the MLE in forecasting errors,as indicated in the last column of Table 1.

Example 2. We now generate data sets from VAR(T = 20; p= 2; L= 1;�;�), where

�=

(1:0 0:71

0:71 2:0

); �=

1:0 1:0

0:7 0

0:3 1:0

:

Here errors are assumed to have correlation of 0.5, and B1 is lower triangular, sug-gesting that the lags of y1t are not useful for predicting y2t . The VAR contains a unitroot.

We calculate the frequentist risks and compare the performance of Bayesian estima-tors based on the set of test priors. The results are reported in Table 2. By construction,the reference prior employed in this paper re-parameterizes � as O′DO, with diagonalmatrix D being the eigenvalues and O being an orthogonal matrix. The eigenvaluesare placed before the orthogonal matrix in the order of importance, hence by designthe performance for estimators for D is perceived to be more important. In the pre-vious example the � matrix is diagonal, and the reference prior does much better. Inthis example, the pairwise correlations of VAR residuals are close to unity, hence theo9-diagonal elements of the � matrix are more prominent. But note that even in thiscase the reference prior still does better than other priors.

Example 3. We now consider VAR(T = 20; p= 2; L= 1;�;�), where

�=

(1:0 0:71

0:71 2:0

); and �=

1:0 1:0

0:3 0:7

0:7 0:3

:

Page 19: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197 177

Table 3Example 3

R1(�) R2(�) R22 RImp Improvement in forecast(W1; W2)

MLE 0.338(0.410) 2.376(3.006) 0.169(0.203) 0.308CA 0.236(0.299) 2.376(3.009) 0.169(0.203) 0.315 (0:03;−0:03)CJ 0.198(0.206) 2.377(3.006) 0.169(0.203) 0.340 (−0:02;−0:01)CR 0.194(0.186) 2.375(2.999) 0.1687(0.203) 0.332 (0:08;−0:01)SA 0.236(0.300) 1.091(1.129) 0.131(0.154) 0.281 (18:02; 24:81)SJ 0.198(0.207) 1.052(1.002) 0.125(0.147) 0.337 (18:74; 25:93)SR 0.188(0.184) 1.009(0.968) 0.127(0.150) 0.300 (19:32; 26:33)TMR 0.583(0.261) 1.860(0.620) 1.311(0.075) 0.440 (−205:0;−82:78)LMR 0.188(0.176) 0.869(0.746) 0.128(0.141) 0.298 (21:39; 28:27)

For an explanation to notations, see footnote of Table 1.

The focus of this example is to compare the Minnesota prior with the shrinkage prior.The data-generating model in this example is substantially di9erent from random walks.Under the tight Minnesota prior, the estimate of the VAR lag coe4cient matrix isseverely biased towards the identity matrix. The average estimation errors of impulseresponses of the tight-Minnesota-prior-based estimates are larger than those of the otherestimates. The last column of Table 3 shows that the one-step-ahead forecast errors ofthe Bayesian estimates under the tight Minnesota prior are much larger than those ofthe MLE.The conditional densities (� |�;Y; �) or (� |�;Y) under the shrinkage and Min-

nesota priors are both multivariate normal. Under the Minnesota prior, the conditionalmean of �, �M , is the MLE �MLE adjusted by the weighted di9erence between themean of prior �0 and the MLE. Under the shrinkage prior the conditional mean �S isthe MLE multiplied by a shrinkage matrix. Both the Minnesota prior and the shrinkageprior lead to smaller conditional variance. The reduction of conditional variance by theMinnesota prior depends on the variance of the prior M0. The tighter the Minnesotaprior (i.e. the smaller M0) the larger is the reduction in the conditional variance. Therelative performance of the shrinkage prior and the Minnesota prior depends on whetherthe Minnesota prior correctly reMects the true parameters. If �0 is closer to the trueparameter � than the MLE �MLE, and if the variance M0 is small, then the Minnesotaprior should be superior to the shrinkage prior. On the other hand, if the Minnesotaprior is not concentrated around the true parameter �, then the shrinkage prior or theloose Minnesota prior may dominate the tight Minnesota prior.

Example 4. We generate data from VAR(T = 20; p= 2; L= 1;�;�), where

�=

(1:0 0:71

0:71 2:0

);�=

3:0 3:0

0:3 0

0 0:3

:

Page 20: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

178 S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197

Here � has small lag coe4cients but large intercepts. In this case, the constant priordominates the shrinkage prior and the Minnesota prior. The frequentist average of theMLE for � and the VAR lag coe4cients are biased downwards. The intercepts arebiased upwards. Under the constant-reference prior, the average of the estimates for� is better than that of the MLE, while the average of estimates for � is almostidentical to that of the MLE. Under the shrinkage-reference prior, the estimates for� have an upward bias in magnitude similar to the downward bias of MLE. But thevariance of the estimates is smaller than that of the MLE, which is the main reason forsmaller frequentist risk associated with the Bayesian estimates. Contrary to the Bayesianestimates under the constant prior, under the shrinkage prior the Bayesian estimates forthe intercepts are biased downward while the estimates for VAR coe4cients tend tobe biased upward. Finally, under the tight Minnesota-reference prior, the frequentistaverage of the Bayesian estimates for the VAR lag coe4cients is biased towards theidentity matrix. In terms of the estimation errors of impulse responses and MSE offorecast, the constant prior also dominates the shrinkage and Minnesota priors, withthe tight Minnesota prior being the worst among all priors under examination. Theestimates of � and B1 both show upward bias under the shrinkage-reference prior, thecompound e9ect of which may explain the relatively poor performance of the estimatorin terms of impulse responses. This example shows that for a VAR(1) (the number inthe bracket indicates the lag length) model with large intercept terms and small VARcoe4cients, the constant prior is better than the shrinkage or Minnesota prior. Thisresult is partially due to fact that the downward biases of ML estimates of VAR lagcoe4cients are relatively small when the true parameters are near zero. MacKinnon andSmith (1998) show that the downward bias of ML estimates for an AR(1) coe4cient isnonlinear in the true parameter. When the true parameter is near unity the downwardbias is substantially larger than when the true parameter is near zero. The constantprior is better than the shrinkage and Minnesota priors in estimating the interceptterms. If the intercept terms are large, the downward bias induced by the shrinkageand Minnesota priors is ampli3ed, resulting in undesirable performance. However, formost macroeconomic applications of VAR models, the 3rst lag coe4cient matrix B1

is not as small as in this example. So in practice, the dominance of the constant-prioris not a very likely scenario. In addition, the dominance of the constant prior is nolonger present for VARs with longer lags. For example, for the same covariance matrixand intercept terms, if the lag coe4cient is changed from 0.3 in a VAR(1) to 0.1 ineach of the lags in a VAR(3), then the shrinkage prior dominates the constant andMinnesota priors.

Example 5. We now generate data sets from VAR(T = 20; p= 2; L= 2;�;�), where

�=

(1:0 0

0 1:0

); �=

1:0 1:0

1:85 0

0 1:85

−0:9 0

0 −0:9

:

Page 21: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197 179

Table 4Example 4

R1(�) R2(�) R22 RImp Improvement in forecast(W1; W2)

MLE 0.315(0.331) 4.017(4.822) 0.370(0.407) 0.099CA 0.220(0.232) 4.021(4.827) 0.370(0.408) 0.105 (−0:06;−0:07)CJ 0.189(0.157) 4.016(4.826) 0.370(0.406) 0.114 (−0:07;−0:04)CR 0.184(0.145) 4.016(4.822) 0.370(0.406) 0.112 (0:03; 0:14)SA 0.220(0.232) 4.896(3.495) 0.371(0.350) 0.147 (−26:21;−17:16)SJ 0.189(0.157) 5.426(3.496) 0.376(0.339) 0.177 (−33:69;−21:51)SR 0.184(0.135) 5.455(3.407) 0.371(0.335) 0.177 (−35:95;−19:89)TMR 0.261(0.149) 11.667(0.735) 0.486(0.047) 0.383 (−108:0;−90:00)LMR 0.196(0.136) 9.460(1.498) 0.406(0.226) 0.267 (−73:75;−40:34)

For an explanation to notations, see footnote of Table 1.

The VAR variables are nearly I(2). In this example the larger matrix � does notsubstantially change the computation cost for the MCMC routine. The acceptance ratesof the Metropolis step in simulation of � under the reference prior are around 63percent. The tight Minnesota prior is again substantially worse than other priors inall aspects except for the average MSE of impulse responses. This is due to the factthat under the tight Minnesota prior the estimates for B1 are biased downward, whilethe estimates for � are biased upward. These two types of bias partially o9set whenthe impulse response functions are computed. This example suggests that errors inestimating of impulse response functions may not be good indicators for accuracy ofVAR estimates.The 3ve examples of the bivariate VAR provide a fairly comprehensive picture

on the performance of the test priors. For estimating the covariance matrix �, thereference prior dominates the Je9reys and RATS priors. For estimating VAR coe4-cients �, the shrinkage prior most likely dominates the constant prior. The relativeperformance of the Minnesota prior depends on the tightness of the prior and thenature of the data generating models. When the data generating process is not sim-ilar to random walks, the tight Minnesota prior may be much less desirable than aloose Minnesota prior. In fact, when the data generating process is su4ciently di9er-ent from the random walk, even the loose Minnesota prior can be undesirable (as inExample 4).We examine the robustness of the pattern exhibited in Tables 1–5 by altering the

sample size and the size of the VAR. We simulate the same models as that in Examples1–5, but the sample size T is increased to 50 from 20. With the enlarged sample size,the average losses are smaller under all priors, and the di9erence in losses are smalleras well. This is because more data observations diminish the impact of prior choice.However, in most cases the shrinkage-reference prior still performs better than theother priors. We experiment with VARs containing more explosive roots and 3nd thatthe shrinkage-reference prior combination still dominates other noninformative priorcombinations.

Page 22: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

180 S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197

Table 5Example 5

R1(�) R2(�) R22 RImp Improvement in forecast(W1; W2)

MLE 1.052(0.974) 43.316(65.798) 0.999(0.774) 0.853CA 0.735(0.759) 43.318(65.831) 0.999(0.774) 0.873 (−0:04; 0:06)CJ 0.360(0.416) 43.301(65.765) 0.999(0.773) 0.942 (0:02;−0:03)CR 0.257(0.343) 43.303(65.701) 0.999(0.774) 0.922 (0:16; 0:08)SA 0.735(0.760) 8.187(17.881) 0.815(0.624) 0.733 (20:77; 23:90)SJ 0.360(0.416) 5.352(1.055) 0.816(0.617) 0.751 (30:32; 27:85)SR 0.199(0.281) 3.152(3.327) 0.804(0.610) 0.728 (28:48; 25:76)TMR 0.586(0.407) 4.974(2.998) 1.901(0.273) 0.711 (−259:9;−241:3)LMR 0.191(0.255) 2.147(1.314) 0.670(0.468) 0.717 (33:52; 31:75)

For an explanation to notations, see footnote of Table 1.

The following examples show that the e9ects of prior choice are more prominentwhen the number of variables in the VAR is increased from two to six, even withsample size T increased from 20 to 50. We consider several VAR models representativeof many monthly and quarterly macroeconomic variables.

Example 6. We now consider VAR(T = 50; p = 6; L = 1;�;�), with intercept c = 0,lag coe4cients B1 = I6, and covariance matrix �= I6,

Compared to the case with p = 2 in Example 5, in this example there are a largernumber of parameters. The number of parameters to be estimated in � is increasedfrom 3 (with p=2) to 21 (with p=6) and the number of parameters in � is increasedfrom 6 to 42. The Bayesian estimators with the shrinkage-reference prior combinationdominates MLE and Bayesian estimators under other priors in terms of average lossesassociated with the covariance matrix. The acceptance rates of the Metropolis step insimulating � under the reference prior are about 27 percent. Compared with Example 1,a notable di9erence made by the larger number of parameters and larger sample size isthat the frequentist average loss associated with � under the shrinkage-reference prioris now smaller than that of the MLE. It is known that MLE of B1 is biased towards thestationary region. The downward bias in B1 is much smaller under the shrinkage andMinnesota priors. Under the shrinkage prior, the frequentist average losses associatedwith � are small mainly because the estimates of the intercept term are not as erraticas the MLE. A striking result is that the frequentist average loss for � under theshrinkage-prior is smaller than that of the tight Minnesota prior. This is largely due tothe fact that b3, the variance of the Minnesota prior for the intercept term, is set at1.0. If b3 is set at 0:22, then the average loss for � is reduced from 2.653 to about0.3, smaller than that under the shrinkage prior.The frequentist risks of the Bayesian estimates of nonintercept terms under the

shrinkage prior are larger than under the tight Minnesota prior and are comparableto those under the loose Minnesota prior. The tight Minnesota prior performs best in

Page 23: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197 181

Table 6Example 6

R1(�) R2(�) R22 RImp Improvement in forecast(W1; W2; W3; W4; W5; W6)

MLE 1.410(0.553) 15.681(13.483) 1.083(0.306) 0.353CA 0.963(0.412) 15.682(13.488) 1.082(0.306) 0.355 (0:03; 0:01;−0:04;−0:01;−0:02;−0:09)CJ 0.677(0.282) 15.682(13.491) 1.083(0.306) 0.358 (0:07; 0:02; 0:07;−0:08;−0:06; 0:10)CR 0.255(0.173) 15.684(13.483) 1.083(0.305) 0.354 (0:04;−0:08;−0:05; 0:01; 0:03; 0:05)SA 0.963(0.412) 1.593(0.494) 0.872(0.240) 0.334 (8:96; 14:13; 9:70; 13:67; 9:28; 7:83)SJ 0.677(0.282) 1.455(0.406) 0.881(0.238) 0.334 (7:19; 12:31; 8:32; 12:45; 7:38; 5:95)SR 0.225(0.152) 1.258(0.287) 0.877(0.240) 0.329 (7:79; 12:99; 8:25; 12:39; 8:29; 6:84)TMR 0.220(0.148) 2.653(0.896) 0.342(0.068) 0.312 (64:05; 63:97; 63:68; 63:42; 64:92; 64:79)LMR 0.250(0.166) 4.369(1.404) 0.830(0.236) 0.344 (17:44; 21:88; 17:40; 18:54; 19:23; 18:53)

For an explanation to notations, see footnote of Table 1.

terms of impulse responses and forecasting errors. The shrinkage prior is e9ective inreducing the frequentist variance of the estimates, but it tends to yield biased estimates.The bias results in relatively mediocre performance in terms of impulse responses andforecast errors compared with the tight Minnesota prior. A loose Minnesota prior, onthe other hand, may not be better than the shrinkage prior.Table 6 demonstrates that the reference prior yields estimators for � with good

frequentist properties in terms of average losses. More intuitive comparisons can bemade by plotting the histograms of estimators of the � parameters across the 1,000generated samples. Since it is impossible to plot such graphs for matrices, in thefollowing we focus on a single element of covariance matrix �; 1;1. Fig. 1 plotsthe frequentist distributions of posterior means of 1;1 under test priors, and that ofthe MLE. Comparison of the panels shows that the MLE and the RATS-prior-basedestimator are skewed to the left while the Je9reys-prior-based estimators are moreskewed to the right of the true value (1.0). The reference-prior-based estimator showsrelatively small bias, but its most prominent feature is the small dispersion. The 3gureo9ers intuitive con3rmations of results in Table 6.

Example 7. We examine the test priors in a VAR with Granger causal chain. Weconsider VAR(T=50; p=6; L=1;�;�), with intercept c=(1; 1; 1; 1; 1; 1), and followingcovariance matrix � and lag coe4cients B1:

�=

1:00 0:71 0:87 1:00 1:12 1:22

0:71 2:00 1:22 1:41 1:58 1:73

0:87 1:22 3:00 1:73 1:94 2:12

1:00 1:41 1:73 4:00 2:24 2:45

1:12 1:58 1:94 2:24 5:00 2:74

1:22 1:73 2:12 2:45 2:74 6:00

;

Page 24: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

182 S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197

(a)0.4 0.8 1.2 1.6

0.0

0.5

1.0

1.5

2.0

2.5

3.0

(b)0.4 0.8 1.2 1.6

0.0

0.5

1.0

1.5

2.0

2.5

3.0

(c)0.4 0.8 1.2 1.6

0.0

0.5

1.0

1.5

2.0

2.5

3.0

(d)0.4 0.8 1.2 1.6

0.0

0.5

1.0

1.5

2.0

2.5

3.0

(e)0.4 0.8 1.2 1.6

0.0

0.5

1.0

1.5

2.0

2.5

3.0

(f)0.4 0.8 1.2 1.6

0.0

0.5

1.0

1.5

2.0

2.5

3.0

(g)0.4 0.8 1.2 1.6

0.0

0.5

1.0

1.5

2.0

2.5

3.0

(h)0.4 0.8 1.2 1.6

0.0

0.5

1.0

1.5

2.0

2.5

3.0

(i)0.4 0.8 1.2 1.6

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Fig. 1. Frequentist histograms of the estimators of 1;1 in Example 6 with p = 6, L = 1, T = 50 and� = I6: (a) posterior mean based on constant-RATS prior; (b) posterior mean based on constant-Je9reysprior; (c) posterior mean based on constant-reference prior; (d) posterior mean based on shrinkage-RATSprior; (e) posterior mean based on shrinkage-Je9reys prior; (f) posterior mean based on shrinkage-referenceprior; (g) posterior mean based on a Tight Minnesota-reference prior; (h) posterior mean based on a LooseMinnesota-reference prior; (i) MLE.

B1 =

1=6 0 0 0 0 0

1=6 1=5 0 0 0 0

1=6 1=5 1=4 0 0 0

1=6 1=5 1=4 1=3 0 0

1=6 1=5 1=4 1=3 1=2 0

1=6 1=5 1=4 1=3 1=2 1

:

Page 25: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197 183

Table 7Example 7

R1(�) R2(�) R22 RImp Improvement in forecast(W1; W2; W3; W4; W5; W6)

MLE 0.939(0.392) 6.950(9.272) 1.563(0.809) 0.412CA 0.666(0.286) 6.953(9.269) 1.563(0.809) 0.423 (0:05;−0:05;−0:02;−0:16;−0:05;−0:04)CJ 0.554(0.199) 6.948(9.253) 1.563(0.809) 0.444 (0:04; 0:16; 0:12; 0:16;−0:08;−0:02)CR 0.459(0.166) 6.952(9.298) 1.563(0.809) 0.421 (−0:22; 0:05;−0:09; 0:03;−0:06; 0:01)SA 0.666(0.286) 5.556(0.596) 0.657(0.212) 0.380 (−17:19; 6:07; 10:51; 15:36; 2:21; 9:89)SJ 0.554(0.198) 5.650(0.537) 0.640(0.203) 0.404 (−19:96; 4:91; 10:09; 14:30; 18:25; 5:75)SR 0.439(0.151) 5.710(0.470) 0.633(0.200) 0.396 (−24:68; 4:49; 10:25; 14:86; 18:62; 6:47)TMR 0.498(0.140) 3.023(1.390) 0.864(0.146) 0.454 (−69:28;−47:19;−35:09;−20:04;−4:19; 41:58)LMR 0.452(0.165) 3.735(1.656) 1.037(0.423) 0.380 (7:37; 17:40; 16:29; 19:62; 23:21; 22:15)

For an explanation to notations, see footnote of Table 1.

The covariance matrix implies pairwise correlation of 0.5. The VAR contains a unitroot. The results are qualitatively the same as Example 2. The shrinkage prior producesbetter estimators of B1 than the MLE because it reduces variance through shrinkage.Many elements of the shrinkage-prior-based Bayesian estimator show smaller bias thanthe MLE. For example, the intercept terms in the MLE are considerably larger thanthose of the Bayesian estimators and larger than the true parameter of 1. The es-timates under the shrinkage and Minnesota priors under-estimate the intercepts. Theforecasts of the 3rst variable by the shrinkage-prior-based estimators are worse thantheir constant-prior-based counterparts, similar to the 3nding in Example 4 where theconstant prior is better when the VAR lag coe4cients are small. It is not surprisingthat the estimator under the tight Minnesota prior does better in forecasting the sixthvariable since it follows a random walk (Table 7).

Example 8. We now consider VAR(T = 50; p = 6; L = 2;�;�), with intercept c =(1; 1; 1; 1; 1; 1), the covariance matrix � as in Example 7. The VAR lag coe4cientsB1 is twice the B1 matrix in Example 7, and B2 is the negative of the B1 matrix inExample 7. The sixth variable follows an I(2) process. For this example, we reducethe number of MCMC cycles to 5,500 with 500 burn-in runs to reduce computing time(which is over 80 h total for simulations under all priors). The acceptance rates forthe Metropolis step in simulating � under the reference prior are about 36 percent.Tables 8 and 9 shows that the Bayesian estimator of � based on the tight Minnesotaprior is better than the MLE, but for B1 it is worse than the MLE and the Bayesianestimator based on the shrinkage-reference prior. The examples show that the perfor-mance of the Minnesota priors depends on the data generating process and the settingof hyper-parameters. In practice, researchers often follow conventions when they selecthyper-parameter values. As we point out in the introduction, it is quite unlikely that aset of hyper-parameters is suitable for all data generating processes. The conventionalvalues of the hyper-parameters (e.g., b1 = 0:22) may result in undesirable estimators.On the other hand, when researchers decide to use alternative hyper-parameters to

Page 26: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

184 S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197

Table 8Example 8

R1(�) R2(�) R22 RImp Improvement in forecast(W1; W2; W3; W4; W5; W6)

MLE 1.718(0.692) 26.867(41.701) 3.812(1.411) 2.755CA 1.187(0.528) 26.852(41.659) 3.8116(1.413) 2.732 (−0:07;−0:07;−0:14;−0:06;−0:11;−0:06)CJ 0.679(0.258) 26.878(41.701) 3.814(1.414) 2.781 (−0:11;−0:08; 0:15; 0:13; 0:06;−0:02)CR 0.561(0.212) 26.874(41.784) 3.812(1.412) 2.709 (0:14; 0:05; 0:19; 0:02; 0:04; 0:19)SA 1.187(0.529) 6.607(1.117) 2.239(0.607) 2.103 (3:48; 11:83; 14:41; 2:33; 15:69; 6:61)SJ 0.679(0.259) 6.778(0.958) 2.211(0.575) 2.242 (−0:63; 10:15; 12:81; 17:55; 10:43;−4:02)SR 0.528(0.189) 6.795(0.893) 2.209(0.572) 2.167 (−1:00; 10:48; 12:95; 17:02; 10:01;−3:43)TMR 1.258(0.389) 5.163(1.236) 4.032(0.412) 3.083 (−1115;−754;−830;−828;−871;−746)LMR 0.538(0.199) 5.108(1.856) 2.242(0.646) 2.225 (16:48; 19:10; 19:09; 23:08; 21:76; 20:92)

For an explanation to notations, see footnote of Table 1.

Table 9Example 9

R1(�) R2(�) R22 RImp Improvement in forecast(W1; W2; W3; W4; W5; W6)

MLE 0.674(0.280) 24.313(16.011) 15.662(12.752) 0.143CA 0.502(0.207) 24.316(16.027) 15.668(12.772) 0.150 (0:03;−0:05;−0:09; 0:02; 0:01; 0:01)CJ 0.436(0.149) 24.309(16.012) 15.656(12.739) 0.159 (−0:13; 0:02;−0:04; 0:04;−0:06;−0:01)CR 0.420(0.149) 24.325(15.997) 15.669(12.739) 0.151 (−0:06;−0:01; 0:11; 0:11;−0:06;−0:10)SA 0.502(0.207) 13.578(1.445) 3.799(0.770) 0.122 (18:02; 6:38;−50:08; 30:47; 8:92;−9:66)SJ 0.436(0.149) 14.094(1.298) 3.989(0.706) 0.128 (15:45; 3:78;−57:77; 29:27; 8:87;−13:34)SR 0.434(0.142) 14.824(0.986) 4.189(0.609) 0.130 (11:94; 1:91;−72:42; 28:69; 10:09;−22:08)TMR 0.488(0.146) 12.658(1.564) 4.634(0.418) 0.128 (47:35; 24:12;−63:63; 33:12;−8:23;−51:39)LMR 0.420(0.147) 11.542(3.608) 5.375(3.093) 0.130 (22:66; 18:67;−6:36; 29:81; 13:02; 10:48)

For an explanation to notations, see footnote of Table 1.

incorporate their knowledge of the data generating processes, it would become neces-sary for readers to take into account the di9erence between their own priors and thoseof the researchers. Adopting a noninformative prior as a reference for a wide rangeof empirical problems may be a better approach if a researcher is not very certainabout the validity of his priors or when opinions of di9erent researchers are diverse. Inaddition to the convenience in scienti3c reporting, a good noninformative prior may beless vulnerable to mistakes in researchers’ judgement and therefore be able to deliverrobust performance for a large variety of problems.

Example 9. Now we consider a numerical example based on a set of actual macroe-conomic data. We apply VAR(T = 58; p = 6; L = 1;�;�) model to analyze quarterlydata of the U.S. economy from 1987Q1 to 2001Q2. The variables include the M2money stock, nonborrowed reserves, federal funds rate, world commodity price, GDP

Page 27: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197 185

deMator, and real GDP. The commodity price data are obtained from the InternationalMonetary Fund and the rest of data series from the FRED database at the FederalReserve Bank of St. Louis. All variables except the fed funds rate are growth rates.All variables are measured in percentage terms. These variables frequently appear inmacroeconomics related VARs (e.g., Sims, 1992; Gordon and Leeper, 1994; Sims andZha, 1998b; Christiano et al., 1999). The six data series exhibit strong pairwise andserial correlations. We use the MLE of the actual data as the “true” parameters for� and � and conduct the same MCMC simulations for drawing posteriors of VARcoe4cients and the covariance matrix as in previous examples. Note that the impulseresponses are based on the lower triangular mapping from the VAR residuals to struc-tural shocks. The order of the variables implies that a shock in a variable a9ects allother variables placed before it contemporaneously but not the other way around.The reference prior shows moderate improvement over the RATS prior and is com-

parable to the Je9reys prior. The absence of more signi3cant improvement of thereference prior can be explained by two reasons. First, there are strong pairwise corre-lations of the VAR error terms that make the o9-diagonal elements prominent. Sincethe reference prior places the variance components in higher priority than the covari-ance components, it tends to perform less well in case the covariance components arelarge. Second, the reference prior shrinks the eigenvalues of the covariance towardsone another. It does less well when the true data generating model has variance com-ponents that are very di9erent in scale, as is the case here. The variances of the errorterms range from 0.035 (GDP deMator) to 7.74 (commodity price).A few general conclusions can be drawn from these numerical examples.

(1) Yang and Berger’s reference prior for the covariance matrix � dominates theJe9reys and RATS prior in many cases. The reference prior does less well whenthe data-generating � has large o9-diagonal elements and the variance componentsare signi3cantly di9erent. But even in the least favorable cases, the reference prioris not dominated by its competitors.

(2) The posterior mean of � under the constant prior (regardless of the prior on �) hasproperties very similar to the MLE. For VAR(1) models consisting near-random-walk type variables, the frequentist averages of the posterior means under theconstant prior over-estimate the intercept term c and under-estimate the VAR lagcoe4cients B1. The shrinkage-prior-based estimators induce smaller frequentistaverage losses mainly because the shrinkage prior e9ectively reduces variances ofthe elements in � across samples.

(3) Impulse responses and forecasting errors are nonlinear functions of elements of� and �. Smaller frequentist average losses with respect to parameters do notnecessarily lead to smaller average losses in terms of impulse responses and fore-casting errors, and vice versa. In Example 5, the tight Minnesota prior happensto signi3cantly over-estimate � and under-estimate B1. But the biases cancel outand the estimates for impulse responses are more accurate than those with betterestimated � and �. A shrinkage prior often reduces the variance of the elementsof the posterior mean of � but may make them quite biased. The bias may re-sult in poor performance in terms of impulse responses and forecasting errors.

Page 28: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

186 S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197

Estimators other than the posterior mean may be more desirable under the shrink-age prior if they can reduce the bias.

(4) As with any informative prior, the performance of the Minnesota prior dependson the nature of the data generating model and the hyper-parameters. If the VARis made of random-walk type of variables, then a tightly set Minnesota prior doesbetter than a loosely set Minnesota prior and noninformative priors. However, ifthe model is not in agreement with the prior, a tightly set Minnesota prior doesmuch worse than alternative priors. The examples highlight the sensitivity of theestimates to the hyper-parameters and serve as a note of caution for researcherswho rely on an informative prior. Some other numerical results are given inNi and Sun (2002).

6. Concluding remarks

In this study we evaluate Bayesian VAR estimators based on several noninformativepriors in terms of frequentist risks. For the VAR covariance matrix �, we study theJe9reys prior, the RATS prior and Yang and Berger’s reference prior. For VAR coef-3cients �, we consider the constant prior, a shrinkage prior, and the Minnesota prior.We establish the propriety of posteriors as well as existence of posterior moments for(�;�) under a general class of priors that includes the prior combinations studied inthis paper. We compute posteriors under di9erent priors via MCMC simulations. Ournumerical examples show that in most cases the combination with the shrinkage prioron � and Yang and Berger’s reference prior on � produces smaller frequentist aver-age losses than other combinations of noninformative priors, mainly through reducingthe variances of estimates across samples. In all examples considered in the paper theconstant prior generates Bayesian estimates of � very similar to the MLE. We also3nd that the performance of the Minnesota prior critically depends on the tightnessof the prior and the nature of data generating models. A tightly set Minnesota priordominates the shrinkage prior when the data generation processes are close to randomwalks, while the shrinkage prior or a loosely set Minnesota prior is a better choiceotherwise. We have argued in the introduction that Bayesian procedures with appro-priate priors are a practical tool for users of VAR models who are mainly concernedwith 3nite sample properties of estimators. In light of the MCMC simulation results,we conclude that the shrinkage-reference prior combination is a reasonable choice forBayesian analysis of 3nite sample inferences of VAR models.The present study can be extended in several directions. First, it is useful to ex-

plore other priors for the VAR model. For estimation of identi3ed VARs, identifyingrestrictions on the factorization of the covariance matrix � may be incorporated intoa prior in a way similar to Sims and Zha (1998a, 1999). For the VAR coe4cients�, it is useful to investigate whether the shrinkage prior can be modi3ed for betterbias correction. Note that the present paper considers priors for � and � separately.Joint noninformative priors for (�;�) are less commonly employed in the literature.Consider the AR(1) model yt=/yt−1+0t , where 0t is iid normal with variance 2. Theasymptotic form of Berger-Bernardo’s reference prior for (/; ) is (1− /2)−1=2 −1 in

Page 29: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197 187

the stationary region |/|¡ 1, which takes the same form as the Je9reys prior. Je9reys(1967) deems the performance of his prior in multiparameter cases unsatisfactory. TheJe9reys and reference prior in this model put in3nite weight at the unit root. Zellner’s(1997) MDIP takes the more reasonable form of (1− /2)1=2 −1. For the 3nite sampleAR(1) model, Phillips (1991) derives the joint Je9reys prior, and Berger and Yang(1996) derive a joint reference prior for the autoregressive and the variance param-eters. Sims (1991) points out some undesirable features of the 3nite sample AR(1)Je9reys prior. Nonetheless, deriving and evaluating joint priors for the VAR model isan interesting research topic. Kleibergen and van Dijk (1994) derive the joint Je9reysprior for (�;�). Other types of joint priors remain to be examined. The prior analysison reduced-form VARs can also be extended to more restrictive models. Some re-cent examples of Bayesian analysis on simultaneous equation models and VARs withcointegration include Gao and Lahiri (2002), Kleibergen and van Dijk (1998), andKleibergen and Paap (2002).The second direction of extension is to consider loss functions that produce Bayesian

estimators di9erent from the posterior mean. There are good reasons to doubt theuse of the constant-weighted quadratic loss. In economic applications, the elementsin matrix � are unlikely to be of equal importance. Furthermore, if the unit ofmeasurement is changed for a data series (e.g., the dollar amount of GDP is mea-sured in trillions instead of billions), then the corresponding elements in � alsochange in magnitude. It is obvious that placing data-independent weights on the es-timation errors is unreasonable. Some alternatives to the quadratic loss function in-clude Varian’s Linex asymmetric loss, discussed in Zellner (1986) and functions usedfor the minimum expected loss (MELO) approach in Zellner (1978). The LINEXloss allows for asymmetric weight on the positive and negative estimation errors,and the MELO functions place data-dependent weights on the elements of �. Anadditional motivation for considering alternative loss functions is that the posteriormean of � under the shrinkage prior can be quite biased. Correction of the biasmay make substantial improvement for estimation of the impulse responses. Thesequestions are beyond the scope of this paper, and they are on our agenda for futureresearch.

Acknowledgements

An earlier version of this paper was presented at the 2001 Joint Statistical Meetingin Atlanta. We thank John Robertson, Chris Sims, and Tao Zha for numerous valu-able comments. We also thank the editor, Arnold Zellner, an associate editor, and twoanonymous referees for many constructive suggestions for revising the paper. We areespecially grateful to Paul Speckman for his careful proof reading and helpful com-ments. The research is supported by a grant from the Research Board of Universityof Missouri System. Sun’s research is also supported by National Science Founda-tion grants DMS-9972598 and SES-0095919 and a grant from Missouri Department ofConservation.

Page 30: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

188 S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197

Appendix A. Proof for Theorem 1

In the following, we let C1; C2; : : : be constants depending only on sample size Tand observation Y. We rewrite the likelihood function (5) of (�;�) as

L(�;�)

=1

|�|T=2 exp[−12(�− �)′{�−1 ⊗ (X′X)}(�− �)− 1

2tr{�−1S(�)}

]; (A.1)

where �= vec(�) is the ML estimator, where � and S(�)= (Y−X�)′(Y−X�) aregiven by (6) and (7) respectively. Then

∫RJL(�;�) d�=

(2 ) J=2

|�|T=2|�−1 ⊗ (X′X)|1=2 etr{−12�−1S(�)

}:

Since |�−1 ⊗ (X′X)|= |�|−(Lp+1)|X′X|p,∫ ∫

RJL(�;�) (0; b; c)(�;�) d� d�

6C1

∫etr{− 1

2 �−1S(�)}

|�|(T−Lp−1+b)=2{∏16i¡j6p(�i − �j)}c d�: (A.2)

Use the orthogonal decomposition �=O′9O, where 9=diag(�1; : : : ; �p), and O is anorthogonal matrix of the form O = (O12O13 · · ·O1p)(O23 · · ·O2p) · · · (Op−1;p). EachOij is a simple orthogonal matrix of the form

Oij =Oij(oij) =

i

j

I 0 0 0 0

0 cos(oij) 0 −sin(oij) 0

0 0 I 0 0

0 sin(oij) 0 cos(oij) 0

0 0 0 0 I

;

i j

where oij ∈ [ − =2; =2]. Let = (�1; : : : ; �p) and o = (oij; 16 i¡ j6p). It followsfrom Anderson et al. (1987) that the transformation from � to (; o) has the Jacobian

|J| ≡

∏16i¡j6p

cosj−i−1(oij)

∏16i¡j6p

(�i − �j)

: (A.3)

Page 31: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197 189

So the right-hand side of (A.1) equals

C1

∫ ∫

∏16i¡j6p

cosj−i−1(oij)

{∏16i¡j6p(�i − �j)}1−c∏p

i=1 �(T−Lp−1+b)=2i

×etr{−129−1OS(�)O′

}d do

6C1

∫ ∫ {∏16i¡j6p (�i − �j)}1−c∏pi=1 �

(T−Lp−1+b)=2i

etr{−129−1OS(�)O′

}d do:

(A.4)

The last inequality holds because |cosj−i−1(oij)|6 1.Let 31¿32¿ · · ·¿3p¿ 0 be the eigenvalues of S(�), so that S(�) = @diag

(31; 32; : : : ; 3p)@′, where @ is a p × p orthogonal matrix. Clearly S(�) − 3pIp isnonnegative de3nite, and

tr(9−1OS(�)O′)¿ tr(9−1O3pIpO′) = 3p tr(9−1) =

p∑j=1

3p�j: (A.5)

Combining (A.2), (A.4) and (A.5), we have∫ ∫RJ

L(�;�) (0; b; c)(�;�) d� d�

6C2

∫ {∏16i¡j6p(�i − �j)}1−c∏pi=1 �

(T−Lp−1+b)=2i

exp

p∑j=1

3p2�j

d do

6C3

∫ {∏16i¡j6p (�i − �j)}1−c∏pi=1 �

(T−Lp−1+b)=2i

exp

p∑j=1

3p2�j

d: (A.6)

The last inequality holds because the range of oij is bounded.If c=0, note that

∏16i¡j6p(�i−�j)6

∏pi=1 �

p−ii , and the right-hand side of (A.6)

is bounded above by

C3

∫ { p∏i=1

�p−ii

} p∏i=1

1

�(T−Lp−1+b)=2i

exp

p∑j=1

3p2�j

d

6C3

p∏i=1

∫ ∞

0

1

�(T−Lp−1+b−2p+2i)=2i

exp(− 3p2�i

)d�i: (A.7)

Note that∫∞0 x−()+1)e−/=xdx is 3nite if and only if )¿ 0 and /¿ 0. So the right-hand

side is integrable if T −Lp−1+b−2p+2¿ 2, which holds if T ¿ (L+2)p+1−b.

Page 32: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

190 S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197

If c = 1, (A.6) becomes∫ ∫RJ

L(�;�) (0; b;1)(�;�) d� d�

6C3

p∏i=1

∫ ∞

0

1

�(T−Lp−1+b)=2i

exp(− 3p2�i

)d�i; (A.8)

which is integrable if T − Lp − 1 + b− 2¿ 0, i.e. T ¿Lp + 3− b. The results thenfollow.

Appendix B. Proof for Theorem 3

Using the expression (A.1) of the likelihood function and the hierarchical structureof (10), we have∫

RJL(�;�) (a)(�) d�

=∫ ∞

0

{∫RJ

L(�;�) s(� | �) d�} a(�) d�

=∫ ∞

0

(2 ) J=2 |�|−T=2

�J=2 |�−1 ⊗ (X′X) + �−1IJ |1=2etr

{− �

′G�2

− �−1S(�)2

} a(�) d�;

where

G=�−1 ⊗ (X′X)− {�−1 ⊗ (X′X)}{�−1 ⊗ (X′X) + �−1IJ}−1{�−1 ⊗ (X′X)}= �−1{�−1 ⊗ (X′X) + �−1IJ}−1{�−1 ⊗ (X′X)}= {�IJ + �⊗ (X′X)−1}−1: (B.1)

Clearly, G is nonnegative de3nite and etr{− 12 �

′G�}6 1. De3ne 9=diag(�1; : : : ; �p)

and 6 = diag(71; : : : ; 7Lp+1), where �1¿ · · ·¿�p are the eigenvalues of � and 71¿· · ·¿ 7Lp+1¿ 0 are the eigenvalues of the matrix X′X. Then

�J=2|�−1 ⊗ (X′X) + �−1IJ |1=2 = |�9−1 ⊗ 6 + IJ |1=2 =p∏i=1

Lp+1∏j=1

(�7j�−1i + 1)1=2

¿p∏i=1

(�7Lp+1�−1i + 1)(Lp+1)=2

¿ (�7Lp+1�−1p + 1) J=2:

Page 33: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197 191

So we have∫RJ

L(�;�) (a)(�) d�

61

(2 ) J=2|�|T=2 etr{−12�−1S(�)

} ∫ ∞

0

�(J−2−a)=2

(�7Lp+1�−11 + 1) J=2

d�:

Making the transformation u=�7Lp+1�−11 =(�7Lp+1�−1

1 +1), we get �=(�1=7Lp+1)u=(1−u). Thus∫ ∞

0

�(J−2−a)=2

(�7Lp+1�−11 + 1) J=2

d�

=(

�17Lp+1

)(J−a)=2 ∫ 1

0

(u

1− u

)(J−2−a)=2

(1− u) J=2 du

=(

�17Lp+1

)(J−a)=2 ∫ 1

0u(J−a)=2−1(1− u)a=2+1 du

=(

�17Lp+1

)(J−a)=2

Beta(J − a2

;a2+ 2):

The last equality holds from Condition (A). So∫RJ

L(�;�) (a)(�) d�6C�(J−a)=21

|�|T=2 etr{−12�−1S(�)

}; (B.2)

where C=Beta( 12 (J −a); 12 a+2)={(2 ) J=27(J−a)=2

Lp+1 }. Since (a;b;c)(�;�)= (a)(�) (b;c)(�), we have∫ ∫

RJL(�;�) (a;b;c)(�;�) d� d�

6C∫

�(J−a)=21

|�|(T+b)=2{∏

16i¡j6p (�i − �j)}c etr

{−12�−1S(�)

}d�

=C5

∫ ∫

∏16i¡j6p

cosj−i−1(oij)

�(J−a)=21

{∏16i¡j6p (�i − �j)

}1−c

∏pi=1 �

(T+b)=2i

×etr

{−9

−1OS(�)O′

2

}d do

6C6

∫ �(J−a)=21

{∏16i¡j6p (�i − �j)

}1−c

∏pi=1 �

(T+b)=2i

exp

p∑j=1

3p2�j

d; (B.3)

Page 34: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

192 S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197

where the equality follows from the transformation from � to (; o) as in the proof ofTheorem 1.If c = 0, the right-hand side of (B.3) is bounded by

C6

∫ ( p∏i=1

�p−ii

)�(J−a)=21

p∏i=1

1

�(T+b)=2i

exp

p∑j=1

3p2�j

d

6 C6

∫ ∞

0

1

�(T+a+b−J−2p)=2+11

exp(− 3p2�1

)d�1

p∏i=2

∫ ∞

0

1

�(T+b−2p+2i)=2i

×exp(− 3p2�i

)d�i:

So the right-hand side is integrable under Condition (B0).If c = 1, the right-hand side of (B.3) equals to

C6

∫�(J−a)=21

p∏i=1

1

�(T+b)=2i

exp

p∑j=1

3p2�j

d

6 C6

∫ ∞

0

1

�(T+a+b−J )=21

exp(− 3p2�1

)d�1

p∏i=2

∫ ∞

0

1

�(T+b)=2i

exp(− 3p2�i

)d�i:

The right-hand side is integrable under Condition (B1). The results then follow.

Appendix C. Proof for Theorem 5

We prove only the case of k = 2. The proof for the case of k = 0 is similar.Since the posterior is proper from the assumptions, it is enough to show that

∫ ∫RJ

‖�‖2{tr(�2)}h=2L(�;�) (0; b; c)(�;�) d� d�¡∞: (C.1)

Since (� |�;Y) ∼ NJ (�;�⊗ (X′X)−1), we have

E(‖�‖2 |�;Y) = E(�′� |�;Y) = tr{E(��′ |�;Y)}

= tr{��′+ �⊗ (X′X)−1}= �′

�+ tr(�)tr{(X′X)−1}:

Page 35: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197 193

The marginal posterior of � given Y has the form

m(� |Y) =C7

∫L(�;�) d� (b;c)(�)

=C8|�⊗ (X′X)−1|1=2

|�|(T+b)=2{∑16i¡j6p(�i − �j)}c etr{−12�−1S(�)

}

=C91

|�|(T+b−Lp−1)=2{∑16i¡j6p (�i − �j)}c etr{−12�−1S(�)

};

where we use the fact that |� ⊗ (X′X)−1|1=2 = |�|(Lp+1)=2|X′X|−p=2. Therefore theleft-hand side of (C.1) equals J1 + J2, where,

J1 = C10

∫ {tr(�2)}h=2|�|(T+b−Lp−1)=2{∑16i¡j6p(�i − �j)}c etr

{−12�−1S(�)

}d�;

J2 = C11

∫tr(�){tr(�2)}h=2

|�|(T+b−Lp−1)=2{∑16i¡j6p(�i − �j)}c etr{−12�−1S(�)

}d�:

Note that {tr(�2)}h=2 = {∑pi=1 �

2i }h=26 (p�1)h. It is easy to show that

J16C12

∫�h1∏p

i=1 �(T+b−Lp−1)=2i {∑16i¡j6p(�i − �j)}c

×etr{−12�−1S(�)

}d�

6C13

∫ �h1{∑

16i¡j6p(�i − �j)}1−c∏pi=1 �

(T+b−Lp−1)=2i

exp

(−

p∑i=1

3p2�i

)d:

If c = 0,

J16C14

∫1

�(T+b−2h−2p−Lp+1)=21

exp(− 3p2�1

)d�1

×p∏i=2

∫1

�(T+b−Lp−2p+2i−1)=2i

exp(− 3p2�i

)d�i;

which is 3nite if T ¿ (L+ 2)p+ 2h− b+ 1. If c = 1,

J16C15

∫1

�(T+b−2h−Lp−1)=21

exp(− 3p2�1

)d�1

×p∏i=2

∫1

�(T+b−Lp−1)=2i

exp(− 3p2�i

)d�i;

which is 3nite if T ¿Lp+ 2h− b+ 3.

Page 36: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

194 S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197

Similarly,

J26C16

∫�h+11∏p

i=1 �(T+b−Lp−1)=2i {∑16i¡j6p(�i − �j)}c

×etr{12�−1S(�)

}d�

6C17

∫ �h+11 {∑16i¡j6p(�i − �j)}1−c∏p

i=1 �(T+b−Lp−1)=2i

exp

p∑j=1

3p2�j

d:

If c = 0,

J26C18

∫�h+11

∏pi=1 �

p−ii∏p

i=1 �(T+b−Lp−1)=2i

exp

p∑j=1

3p2�j

d;

which is 3nite if T ¿ (L+ 2)p+ 2h− b+ 3. If c = 1,

J26C19

∫�h+11∏p

i=1 �(T+b−Lp−1)=2i

exp

p∑j=1

3p2�j

d;

which is 3nite if T ¿Lp+2h− b+5. Note that the conditions with respect to J2 (forc = 0; 1) are stronger than those with respect to J1. The theorem follows.

Appendix D. Proof for Theorem 7

The condition (AM) implies (A), (B0M) implies (B0), and (B1M) implies (B1).Thus the corresponding posteriors are all proper. It is then enough to show

∫ ∫RJ

‖�‖k{tr(�2)}h=2L(�;�) (a;b;c)(�;�) d� d�¡∞:

Since tr(�2) = tr(92) =∑p

i=1 �2i 6p�21, it is equivalent to show that

∫ ∫RJ

L(�;�)1

‖�‖a−k |�|b=2+h{∏16i¡j6p(�i − �j)}c d� d�

6C20

∫ ∫RJ

L(�;�)�h1 (a−k;b; c)(�;�) d� d�¡∞: (D.1)

Page 37: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197 195

Since a − k ¿ 0, we apply (B.2) to the inner integral by replacing a by a − k. Theright-hand side of (D.1) is bounded by

C21

∫�h+(J−a+k)=21

|�|(T+b)=2{∏16i¡j6p(�i − �j)}c etr{−12�−1S(�)

}d�

6C22

∫ �h+(J−a+k)=21 {∏16i¡j6p(�i − �j)}1−c∏p

i=1 �(T+b)=2i

exp

(−

p∑i=1

3p2�i

)d: (D.2)

If c = 0, the right-hand side of (D.2) is bounded by

C23

∫ ∞

0

exp(− 3p2�1

)

�(T+a+b−J−k−2(p+h))=2+11

d�1

p∏i=2

∫ ∞

0

exp(− 3p2�i

)

�(T+b−2p+2i)=2i

d�i;

which is 3nite under Condition (B0M).If c = 1, the right-hand side of (D.2) is bounded by

C24

∫ ∞

0

exp(− 3p2�1

)

�(T+a+b−(J+k+2h))=21

d�1

p∏i=2

∫ ∞

0

exp(− 3p2�i

)

�(T+b)=2i

d�i;

which is 3nite under Condition (B1M).

References

Anderson, T.W., 1984. An Introduction to Multivariate Statistical Analysis, 2nd Edition. Wiley, New York.Anderson, T.W., Olkin, I., Underhill, L.G., 1987. Generation of random orthogonal matrices. SIAM Journal

on Scienti3c and Statistical Computing 8, 625–629.Baranchik, A.J., 1964. Multiple regression and estimation of the mean of multivariate normal distribution.

Technical Report 51, Department Statistics, Stanford University.Berger, J.O., 1984. Statistical Decision Theory and Bayesian Analysis, 2nd Edition. Springer, New York.Berger, J.O., Bernardo, J.M., 1989. Estimating a product of means: Bayesian analysis with reference priors.

Journal of the American Statistical Association 84, 200–207.Berger, J.O., Bernardo, J.M., 1992. On the development of reference priors. In: Bernardo, J.M., et al. (Ed.),

Bayesian Analysis IV. Oxford University Press, Oxford.Berger, J.O., Strawderman, W.E., 1996. Choice of hierarchical priors: admissibility in estimation of normal

means. Annals of Statistics 24, 931–951.Berger, J.O., Yang, R., 1996. Noninformative priors and Bayesian testing for the AR(1) model. Econometric

Theory 10, 461–482.Bernardo, J.M., 1979. Reference posterior distributions for Bayesian inference. Journal of Royal Statistical

Society Series B 41, 113–147.Chib, S., 1998. Estimation and comparison of multiple change point models. Journal of Econometrics 86,

221–241.Chib, S., Hamilton, B., 2000. Bayesian analysis of cross section and clustered data treatment models. Journal

of Econometrics 97, 25–50.Christiano, L.J., Eichenbaum, M., Evans, C.L., 1999. Monetary policy shocks: What have we learned and to

what end. In: Taylor, J.B., Woodford, M. (Eds.), Handbook of Macroeconomics Vol. 1. Elsevier Science,North-Holland, Amsterdam, New York and Oxford, pp. 65–148.

Page 38: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

196 S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197

Deschamps, P.J., 2000. Exact small-sample inference in stationary, fully regular, dynamic models. Journalof Econometrics 97, 51–91.

Gao, C., Lahiri, K., 2002. A comparison of some recent Bayesian and classical procedures for simultaneousequation models with weak instruments. Working paper, Department of Economics, State University ofNew York at Albany.

Geisser, S., 1965. Bayesian estimation in multivariate analysis. Annals of Mathematical Statistics 36,150–159.

Gelfand, A.E., Smith, A.F.M., 1990. Sampling based approaches to calculating marginal densities. Journalof the American Statistical Association 85, 398–409.

Geweke, J., 1996. Monte Carlo Simulation and Numerical Integration. In: Amman, H.M., Kendrick,D.A., Rust, J. (Eds.), Handbook of Computational Economics, Vol. 1. Elsevier Science, North-Holland,Amsterdam, New York and Oxford, pp. 731–800.

Geweke, J., 1999. Using simulation methods for Bayesian econometric models: inference, development, andcommunication. Econometric Reviews 18, 1–73.

Gordon, D.B., Leeper, E.M., 1994. The dynamic impacts of monetary policy: an exercise in tentativeidenti3cation. Journal of Political Economy 102, 1228–1247.

Hamilton, J.D., 1994. Time Series Analysis. Princeton University Press, Princeton, NJ.Hobert, J.P., Casella, G., 1996. The e9ect of improper priors on Gibbs sampling in hierarchical linear mixed

models. Journal of the American Statistical Association 91, 1461–1473.Je9reys, H., 1967. Theory of Probability. Oxford University Press, London.Kadiyala, K.R., Karlsson, S., 1997. Numerical methods for estimation and inference in Bayesian VAR-models.

Journal of Applied Econometrics 12, 99–132.Kass, R.E., Wasserman, L., 1996. The selection of prior distributions by formal rules. Journal of the American

Statistical Association 91, 1343–1370.Kilian, L., 1999. Finite sample properties of percentile and percentile-t bootstrap con3dence intervals for

impulse responses. Review of Economics and Statistics 81, 652–660.Kleibergen, F., van Dijk, H.K., 1994. On the shape of the likelihood/posterior in cointegration models.

Econometric Theory 10, 514–552.Kleibergen, F., van Dijk, H.K., 1998. Bayesian simultaneous equations analysis using reduced rank structures.

Econometric Theory 14, 701–743.Kleibergen, F., Paap, R., 2002. Priors, posteriors, and Bayes factors for a Bayesian analysis of cointegration.

Journal of Econometrics 111, 223–249.Lee, K., Ni, S., 2002. On the dynamic e9ects of oil price shocks–a study using industry level data. Journal

of Monetary Economics 49, 823–852.Leeper, E.M., Zha, T., 1999. Modest policy interventions. Federal Reserve Bank of Atlanta working paper

99-22.Litterman, R.B., 1986. Forecasting with Bayesian vector autoregression-3ve years of experience. Journal of

Business and Economic Statistics 4, 25–38.MacKinnon, J.G., Smith Jr., A.A., 1998. Approximate bias correction in econometrics. Journal of

Econometrics 85, 205–230.Neyman, J., Scott, E.L., 1948. Consistent estimates based on partially consistent observations. Econometrica

16, 1–32.Ni, S., Sun, D., 2002. Noninformative priors and frequentist risks of Bayesian estimators of

vector-autoregressive models. Working paper 02-10, Department of Economics, University of Missouri.Pagan, A.R., Robertson, J.C., 1998. Structural models of the liquidity e9ect. The Review of Economics and

Statistics 80, 202–217.Phillips, P.C.B., 1991. To criticize the critics: an objective Bayesian analysis of stochastic trends. Journal of

Applied Econometrics 6, 423–434.Sims, C.A., 1972. Money, income, and causality. The American Economic Review 62, 540–552.Sims, C.A., 1980. Macroeconomics and reality. Econometrica 48, 1–48.Sims, C.A., 1986. Are forecast models usable for policy analysis? Quarterly Review of Federal Reserve

Bank of Minneapolis.Sims, C.A., 1991. Comment by Christopher A. Sims on ‘To criticize the critics’ by Peter C.B. Phillips.

Journal of Applied Econometrics 6, 423–434.

Page 39: Noninformative priors and frequentist risks of bayesian ...faculty.missouri.edu/~nix/papers/July03.pdfhas pitfalls. One problem is that prior information developed from experience

S. Ni, D. Sun / Journal of Econometrics 115 (2003) 159–197 197

Sims, C.A., Zha, T., 1998a. Bayesian methods for dynamic multivariate models. International EconomicReview 39, 949–968.

Sims, C.A., Zha, T., 1998b. Does monetary policy generate recessions? Federal Reserve Bank of Atlantaworking paper 98-12.

Sims, C.A., Zha, T., 1999. Error bands for impulse responses. Econometrica 67, 1113–1155.Stein, C., 1956. Inadmissibility of the usual estimator for the mean of a multivariate normal distribution.

Proceedings of the Third Berkeley Symposium, Vol. 1. University of California Press, Berkeley,pp. 197–206.

Sun, D., Berger, J.O., 1998. Reference priors under partial information. Biometrika 85, 55–71.Sun, D., Ye, K., 1995. Reference prior Bayesian analysis for normal mean products. Journal of the American

Statistical Association 90, 589–597.Sun, D., Tsutakawa, R.K., He, Z., 2001. Propriety of posteriors with improper priors in hierarchical linear

mixed models. Statistica Sinica 11, 77–95.Sun, D., Ni, S., 2003. Bayesian analysis of vector-autoregressive models with noninformative priors. Journal

of Statistical Planning and Inference, forthcoming.Tiao, G.C., Zellner, A., 1964. On the Bayesian estimation analysis of multivariate regression. Journal of

Royal Statistical Society Series B 26, 389–399.Yang, R., Berger, J.O., 1994. Estimation of a covariance matrix using the reference prior. Annals of Statistics

22, 1195–1211.Zellner, A., 1971. An Introduction to Bayesian Inference in Econometrics. Wiley, New York.Zellner, A., 1978. Estimation of functions of population means and regression coe4cients including structural

coe4cients: a minimum expected loss approach. Journal of Econometrics 8, 127–158.Zellner, A., 1986. Bayesian estimation and prediction using asymmetric loss functions. Journal of the

American Statistical Association 81, 446–451.Zellner, A., 1997. Maximal data information prior distributions. In: Bayesian Analysis in Econometrics and

Statistics. Edward Elgar, Lyme, UK (Chapter 8).