Top Banner
Bootstrap approach to inference and power analysis based on three test statistics for covariance structure models Ke-Hai Yuan 1 * and Kentaro Hayashi 2 1 University of Notre Dame, USA 2 Georgia State University, USA We study several aspects of bootstrap inference for covariance structure models based on three test statistics, including Type I error, power and sample-size determination. Speci cally, we discuss conditions for a test statistic to achieve a more accurate level of Type I error, both in theory and in practice. Details on power analysis and sample-size determination are given. For data sets with heavy tails, we propose applying a bootstrap methodology to a transformed sample by a downweighting procedure. One of the key conditions for safe bootstrap inference is generally satis ed by the transformed sample but may not be satis ed by the original sample with heavy tails. Several data sets illustrate that, by combining downweighting and bootstrapping, a researcher may nd a nearly optimal procedure for evaluating various aspects of covariance structure models. A rule for handling non-convergence problems in bootstrap replications is proposed. 1. Introduction Covariance structure analysis (CSA) has become one of the most popular methods in multivariate analysis, especially in the social and behavioural sciences. The classical approach to model evaluation uses the likelihood ratio statistic T ML based on the normality assumption (Lawley & Maxwell, 1971). Because data sets in practice are seldom normal, Browne (1984) developed an asymptotically distribution-free statistic, T B . Satorra and Bentler (1988) proposed a rescaled version of the likelihood ratio statistic, T SB , which performs quite robustly under violations of the normality assump- tion (Hu, Bentler, & Kano, 1992). The analytical merits of these statistics are character- ized by asymptotics. With practical data, there still exist violations of conditions that will interfere with accurate applications of these statistics. Actually, none of these statistics 93 British Journal of Mathematical and Statistical Psychology (2003), 56, 93–110 © 2003 The British Psychological Society www.bps.org.uk * Requests for reprints should be addressed to Ke-Hai Yuan, Department of Psychology, Laboratory for Social Research, University of Notre Dame, Notre Dame, IN 46556, USA (e-mail: [email protected]).
18

Bootstrap approach to inference and power analysis based on three test statistics for covariance structure models

Jan 29, 2023

Download

Documents

James Lewis
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Bootstrap approach to inference and power analysis based on three test statistics for covariance structure models

Bootstrap approach to inference and poweranalysis based on three test statistics forcovariance structure models

Ke-Hai Yuan1 and Kentaro Hayashi21University of Notre Dame USA2Georgia State University USA

We study several aspects of bootstrap inference for covariance structure models basedon three test statistics including Type I error power and sample-size determinationSpeci cally we discuss conditions for a test statistic to achieve a more accurate level ofType I error both in theory and in practice Details on power analysis and sample-sizedetermination are given For data sets with heavy tails we propose applying a bootstrapmethodology to a transformed sample by a downweighting procedure One of the keyconditions for safe bootstrap inference is generally satised by the transformed samplebut may not be satised by the original sample with heavy tails Several data setsillustrate that by combining downweighting and bootstrapping a researcher may nd anearly optimal procedure for evaluating various aspects of covariance structure modelsA rule for handling non-convergence problems in bootstrap replications is proposed

1 IntroductionCovariance structure analysis (CSA) has become one of the most popular methods inmultivariate analysis especially in the social and behavioural sciences The classicalapproach to model evaluation uses the likelihood ratio statistic TML based on thenormality assumption (Lawley amp Maxwell 1971) Because data sets in practice areseldom normal Browne (1984) developed an asymptotically distribution-free statisticTB Satorra and Bentler (1988) proposed a rescaled version of the likelihood ratiostatistic TSB which performs quite robustly under violations of the normality assump-tion (Hu Bentler amp Kano 1992) The analytical merits of these statistics are character-ized byasymptotics With practical data there still exist violations of conditions that willinterfere with accurate applications of these statistics Actually none of these statistics

93

British Journal of Mathematical and Statistical Psychology (2003) 56 93ndash110copy 2003 The British Psychological Society

wwwbpsorguk

Requests for reprints should be addressed to Ke-Hai Yuan Department of Psychology Laboratory for Social ResearchUniversity of Notre Dame Notre Dame IN 46556 USA (e-mail kyuanndedu)

can be well described by a chi-square distribution even for simulated normal data withsmall to medium sample sizes (Bentler amp Yuan 1999)

Bootstrap methods provide a competitive alternative for statistical inference underviolations of standard regularity conditions (Efron amp Tibshirani 1993 Davison ampHinkley 1997) Bootstrap testing in CSA was developed by Beran and Srivastava(1985) Bollen and Stine (1993) and Yung and Bentler (1994 1996) further showedhow to obtain correct Type I errors with TML and TB Zhang and Boos (1992) used thebootstrap to test homogeneity of covariance matrices from multiple samples Simulationstudies by Fouladi (1998) indicate that bootstrap inference with TML has the best overallperformance in controlling Type I errors

Although the bootstrap has shown promise for CSA various unknowns related tothis procedure still exist when it is applied to the three commonly used statistics TMLTSB and TB For example bootstrap theory requires the underlying sampling distributionto have nite fourth-order moments With practical data possessing large empiricalkurtosis how can the bootstrap be safely applied in CSA When using a bootstrapmethod are TML TB and TSB equivalent If not which statistic should one chooseWhen inference is based on asymptotics the existing literature indicates that TB has thebest power (Fouladi 2000 Yuan amp Bentler 1997) What are the power properties of thethree statistics when they are used with the bootstrap With a bootstrap method howshould one determine the sample size needed to achieve a certain power in CSA Howshould one deal with non-convergence problems in bootstrap replications

Our purpose in this paper is to address the above issues associated with CSA InSection 2 we provide details of a power analysis via the bootstrap including sample-sizedetermination and a test for close t In Section 3 we compare the pivotal properties ofthe three statistics for bootstrap inference We also discuss limitations of the pivotalproperties to be realized with practical data In Section 4 we propose a downweightingapproach that can be applied with the bootstrap to distributions with heavy tails As willbe shown by means of examples in Section 5 for data having heavy tails using thedownweighting procedure with the bootstrap not only makes the bootstrap procedurevalid but also leads to more accurate statistical inference Examples in Section 5 alsoillustrate how to nd a nearly optimal procedure when using the bootstrap forinference Through the examples we also compare the power properties of thethree test statistics In Section 6 we provide a reasonable way of dealing withnon-convergence in bootstrap replications The paper concludes with a brief discussion

2 Bootstrap approaches to inference and powerBeran (1986) studied the power properties of the bootstrap based on a general statisticwhose distribution depends on nuisance parameters Our development of inference andpower analysis can be regarded as an application of Beranrsquos general theory of resamplingto CSA

21 Model testLet Fx be the cumulative distribution function (cdf ) of a p-dimensional population fromwhich a sample x 1 x n has been drawn Let Fx be the corresponding empiricaldistribution function (edf ) dened by the sample x i s Without loss of generality weassume E(x i) = m and Cov(x i) = S For a covariance structure S(v) an inference

94 Ke-Hai Yuan and Kentaro Hayashi

procedure is to nd whether there exists a v0 such that the null hypothesis

H0 S = S(v0) (1)

holds Agood testing procedure should accept H0 when it is true When an alternativehypothesis S = S a is true such that

H1 S a THORN S(v) for any admissible v (2)

it should reject (1)In order to test (1) with a statistic T = T(x 1 x n) we need a critical value ca

such that

P(T gt ca | H0) = a (3)

Without knowing Fx it is impossible for us to nd ca when the probability in (3) ismeasured by Fx However we can dene a ca such that

P(T gt ca | F0) = a (4)

where F0 is an edf with a covariance matrix satisfying (1) In bootstrap testing for (1) ca

is estimated through resampling from F0 Let

y i = S12 Splusmn 12x x i i = 1 n (5)

where S = S(v) for an admissible estimate v (Beran amp Srivastava 1985) It is obvious thatthe edfs Fx and Fy have the same type of distribution but differ in lsquolocation and scalersquoActually the only purpose of (5) is to create an empirical distribution F0 = Fy havingthe same distributional type as that of Fx and with a covariance matrix satisfying H0

Let b = (y (b)1 y (b)

n ) be a random sample of size n from F0 and Tb be thecorresponding statistic of T evaluated at this sample With independent samples b b = 1 B0 we rearrange the Tb in order and denote them by

T(1) T(2) T(B0 )

The bootstrap estimate ca of ca is the B0(1 plusmn a) th quantile T(B0 (1 plusmn a)) and we reject thenull hypothesis (1) when T gt ca Let a be the exact level of a test dened in (3) and a

be the corresponding level when ca is replaced by ca Under quite general conditionsBeran (1986) established the consistency of a for a These conditions include thefourth-order moments of Fx being nite

22 Power evaluationThe bootstrap can also be used to evaluate the power ba of a statistic T dened by

ba = P(T gt ca | H1) (6)

Let Sa be the covariance matrix in (2) and

z i = S 12a Splusmn 12

x x i (7)

Then Fa = Fz represents the bootstrap population under H1 For b = 1 2 B1 letZb = (z (b)

1 z (b)n ) be random samples of size n from Fa and Tb be the corresponding

statistic of T evaluated at Zb Then the bootstrap estimate of ba in (6) is

ba = fTb gt cag B1 (8)

where ca is the bootstrap estimate of the ca in (4) Under fairlygeneral conditions Beran(1986) established the consistencyof ba for ba as n approaches innity This consistency

Bootstrap approach to inference and power analysis 95

should be understood in the following sense when S a is the population covariancematrix of Fx that generates the observed sample x 1 x n then ba in (7) is consistentfor the probability ba in (6) and the latter is the exact power measured by the cdf Fx

In contrast to (5) the S a in (7) cannot be replaced by a consistent estimate Forexample we will not obtain a consistent estimate of power for T to reject the nullhypothesis H0 under the true covariance matrix S a = Cov(x i) when replacing S a by thesample covariance matrix S x of the x i s The inconsistency here can be well understoodthrough TML Actually with S a being O(1

n

p) apart from S0 the bootstrap approach to

power is asymptotically equivalent to the approach developed in Satorra and Saris(1985) for normally distributed data (see also Steiger Shapiro amp Browne 1985) Theseauthors used a non-central chi-square distribution to describe the behaviour of TMLunder H1 with a non-centrality parameter (NCP)

d = n minv

D(S a S(v)) (9)

where D = DML is the Wishart likelihood discrepancy function When replacing S a by aconsistent estimate S a even though S a = Sa + Op(1

n

p) the NCP estimate based on

Sa can be Op(1) apart from that based on S a This limitation of power evaluation alsoexists in other classical procedures for power analysis (eg Cohen 1988) The fact isthat without knowledge of the true effect size one cannot consistently estimate thepower of a statistic in rejecting the null

Although the bootstrap approach to power analysis is equivalent to the approachproposed by Satorra and Saris (1985) for normally distributed data when S a is O(1

n

p)

apart from S(v) there are fundamental differences between the two approaches Withthe bootstrap approach there is neither a central chi-square distribution when H0 holdsnor a non-central chi-square distribution when H1 holds Instead the distribution of T isjudged by its empirical behaviour based on resampling from observed data We mayregard l = minv D(Sa S(v)) as a measure of effect size For a given sample size thegreater l is the higher the power for a bootstrap test to distinguish between H0 and H1Because there is no chi-square distribution assumption with the bootstrap approachpower analysis cannot resort to a chi-square table and has to be evaluated separately fordifferent samples This tailor-made type of analysis can be regarded as an advantage ofthe bootstrap methodology

23 Determination of sample sizeLet x 1 x n be a pilot sample Under H1 in (2) we want to nd the smallest samplesize for a statistic T to reject H0 with probability b0 For this purpose we rst draw B0independent samples b (m) = (y (b)

1 y (b)m ) of size m from F0 to estimate the critical

value ca(m) as in Section 21 We then draw B1 independent samples Zb (m) Fz of sizem where the covariance matrix of z Fz is S a Evaluating Tb (m) at each sampleZb (m) as in (8) the estimated power for sample size m is

ba(m) = fTb (m) gt ca (m)g B1

Asmaller sample size m1 is needed if ba(m) gt b0 otherwise a greater sample size m 2 isneeded Finding the minimum m such that ba (m) $ b0 may require a series of trialsThe interval-halving procedure in the Appendix of MacCallum Browne and Sugawara(1996) who studied sample-size determination using non-central chi-squares can beequally applied here More discussion on the bootstrap approach to sample size and

96 Ke-Hai Yuan and Kentaro Hayashi

power can be found in Beran (1986) Efron and Tibshirani (1993 Chapter 25) andDavison and Hinkley (1997 Section 46)

24 Test for close tA covariance structure model is at best only an approximation to the real world Anyinteresting model will be rejected when the sample size is large enough Based on theroot-mean-square error of approximation (RMSEA) (Steiger amp Lind 1980) MacCallumet al (1996) proposed testing for close t rather than exact t in (1) Using a non-centralchi-square to describe the behaviour of TML their approach is equivalent to testing if theNCP is less than a prespecied value The test for close t can also be easilyimplemented with the bootstrap Let S c be a covariance matrix with close t such that

dc = n minv

D(Sc S(v)) (10)

where dc is the lsquonon-centrality parameterrsquo in equation (8) of MacCallum et al (1996)Such a covariance matrix can be found in the form of Sh = hS x + (1 plusmn h)S(v) WhenD = DML is the Wishart likelihood discrepancy function Yuan and Hayashi(2001) showed that minv D(Sh S(v)) = D(Sh S(v)) is a strictly increasing function ofh [ [0 1] and it is straightforward to nd a S c once a dc is given Let y i = S 12

c Splusmn 12x x i

i = 1 n then the covariance matrix of y Fy is S c Let Tb b = 1 B0 be thecorresponding statistics of T evaluated respectivelyat B0 independent samples from Fy Then ca = T(B0 (1 plusmn a)) is the critical value estimate of the test for close t and the modelwill be rejected when T = T(x 1 x n) gt ca It is also straightforward to performpower analysis with the bootstrap when a less-close-t covariance matrix describes thepopulation

3 Pivotal property of TML TSB and TB

Although it is unclear which of the statistics TML TSB and TB is best for bootstrappingwith a specic data set there is a general result for choosing a statistic (Beran 1988Hall 1992) In the following we will discuss this result and its relevance to CSA

A statistic is pivotal if its distribution does not depend on any parameters in theunderlying sampling distribution and it is asymptotically pivotal if its asymptoticdistribution does not depend on unknown parameters Let a statistic T = f (S x ) be asmooth function of the sample covariance matrix S x of x 1 x n Suppose T isasymptotically pivotal and an Edgeworth expansion applies to its distribution functionunder H0 (Barndorff-Nielsen amp Cox 1984 Wakaki Eguchi amp Fujikoshi 1990)

P(T t | F0) = G(t) + n plusmn 1g(t) + O(nplusmn 32) (11a)

where G(t) is the asymptotic distribution function of T and g(t ) is a smooth functionthat depends on some unknown population parameters As discussed in Hall (1992) andDavision and Hinkley (1997 Section 261) (11a) generally holds for smooth functionsof sample moments Let T be the corresponding statistic based on resampling from thecorresponding F0 Its Edgeworth expansion is

P(T lt t | F0) = G(t) + n plusmn 1 g(t ) + Op(nplusmn 32) (11b)

Since g(t) = g(t ) + Op(nplusmn 12) in general

P(T t | F0) plusmn P(T t | F0) = Op(nplusmn 32) (12)

Bootstrap approach to inference and power analysis 97

Suppose we know G then asymptotic inference is based on comparing the observedstatistic T = T(x 1 x n) to the critical value Gplusmn 1(1 plusmn a) Equation (11a) implies thatasymptotic inference can achieve a designated level a to the order of accuracy ofOp(1n) According to (12) bootstrap inference based on an asymptotically pivotalstatistic can achieve the level a to a higher order of accuracy When T is notasymptotically pivotal G(t) in (11) depends on some unknown parameters One hasto estimate G(t) by G(t) for asymptotic inference Similarly with the bootstrap one hasto implicitlyestimate the unknown parameters in G(t ) and consequently it achieves thesame order of accuracy to the level a as that based on G(t) So for more accurateinference we should choose a statistic that is as nearly pivotal as possible Beran (1988)gave a nice discussion on using a pivotal statistic for bootstrap and the order of accuracyfor Type I errors

When data are normallydistributed (TML| H0) is asymptoticallydistributed as x 2p plusmn q

where p = p( p + 1)2 and q is the number of unknown parameters in S(v) Conse-quently TML is asymptotically pivotal (Davison amp Hinkley 1997 p 139) More generalconditions also exist for TML to be asymptotically pivotal with some specic models(Amemiya amp Anderson 1990 Browne amp Shapiro 1988 Mooijaart amp Bentler 1991Satorra ampBentler 1990 Shapiro 1987 Yuan ampBentler 1999) Unfortunately there is noeffective way of verifying these conditions in practice The statistic TSB is obtained byrescaling TML using fourth-order sample moments (Satorra amp Bentler 1988) Within theclass of elliptical or pseudo-elliptical distributions with nite fourth-order moments(TSB | H0) approaches x 2

p plusmn q (Yuan amp Bentler 1999) So TSB is asymptotically pivotalwhen the sampling distribution is elliptical or pseudo-elliptical It is not asymptoti-cally pivotal for other types of non-normal distribution In contrast to TML and TSB TBis asymptotically pivotal for any sampling distribution with nite fourth-ordermoments

The above discussion may imply that TB is the preferred statistic for bootstrapinference in CSA However there exist conditions that may hide the pivotal property ofTB with nite samples In addition to the asymptotically pivotal property Davison andHinkley (1997 Section 251) discussed nearly pivotal properties of a statistic T whichrequires the distribution of T to approximately follow the same distribution whensampling from Fx or Fx Since the target function G(t) for the three test statistics is thecdf of x 2

p plusmn q the better a statistic is approximated by a chi-square distribution the moreaccurate its bootstrap inference is in controlling Type I errors Because sample size is aserious issue with CSA a statistic that is asymptoticallypivotal may not be nearlypivotalas will be shown in Section 5

4 Heavy tails with practical dataBootstrap inference based on TML TB or TSB requires Fx to have nite fourth-ordermoments Of course the sample fourth-order moments of any sample will always benite However if some of the fourth-order moments (eg kurtoses) are quite large thecorresponding population fourth-order moments may actuallybe innite Even when allthe population fourth-order moments are nite inference based on the bootstrap willnot be accurate when Fx possesses heavy tails Here we propose applying the bootstrapprocedure to a sample in which cases contributing to heavy tails are properly down-weighted Following the study in Yuan Chan and Bentler (2000) we only use theHuber-type weights (Huber 1981 Tyler 1983) in controlling those cases

98 Ke-Hai Yuan and Kentaro Hayashi

For a p-variate sample x 1 x n let

d i = d (x i m S) = [(x i plusmn m ) cent Splusmn 1(x i plusmn m )]12

be the Mahalanobis distance and u 1(t ) and u 2(t) be non-negative scalar functions Arobust M-estimator ( ˆm S) can be obtained by solving (Maronna 1976)

m =Xn

i = 1

u 1(di )x i

iquest Xn

i = 1

u1(di ) (13a)

S =Xn

i = 1

u 2(d 2i )(x i plusmn m )(x i plusmn m ) cent n (13b)

With decreasing functions u 1(t) and u 2(t) cases having greater di s receive smallerweights and thus their effects are controlled Let r represent the proportion ofoutlying cases one wants to control and r be a constant dened by P(x 2

p gt r 2) = rThe Huber-type weights corresponding to this r are

u 1(d ) =1 if d r

rd if d gt r

raquo(14)

and u 2(d 2) = fu 1(d )g 2J where J is a constant such that Efx 2p u 2(x 2

p)g = p whichmakes the estimate S unbiased if Fx = Np( m S) We may regard r as a tuning parameterwhich can be adjusted according to the tails of a specic data set Applying differenttypes of weights to several data sets Yuan Bentler and Chan (2001) found that the mostefcient parameter estimates often go with Huber-type weights

Let ( ˆm S) be the converged solution to (13) u2 i = u 2fd 2(x i ˆm S)g and

x (r)i = f

u 2 i

p(x i plusmn ˆm )g (15)

Then we can rewrite (13b) as

S =Xn

i = 1

x (r)i x (r)

i cent n

which is just the sample covariance matrix of the x (r)i Yuan et al (2000) proposed using

(15) as a downweighting transformation formula Working with several practical datasets they found that the transformed sample x (r)

i is much better approximated by anormal distribution than the original sample x i As we shall see when applying thebootstrap to a sample x (r)

i the test statistic TML is approximately pivotal For x (r)i we

have

E(x (r)i 1 x (r)

i 2 x (r)i 3 x (r)

i 4 ) = E[u 22(d 2

i )(x i 1 plusmn m1)(x i 2 plusmn m2)(x i 3 plusmn m 3)(x i 4 plusmn m4)] + o(1) (16)

where x (r)i j is the j th element of x (r)

i Because the denominator of u 2(d 2i ) in (14)

contains d 2i when d 2

i gt r 2 (16) is bounded Even when the fourth-order moments of Fx

do not exist the corresponding fourth-order moments of the x (r)i are still nite

When interest lies in the structure of the population covariance matrix S = Cov(x i)one needs to know whether transformation (15) changes the structure of S Let S (r) bethe population covariance matrix of x (r)

i then S (r) = kS when the sampling distribu-tion is elliptically symmetric For almost all the commonly used models in the socialsciences modelling S is equivalent to modelling S (r) (Browne 1984) Yuan et al (2000)further illustrated that outlying cases create signicant multivariate skewness and thatthe sample covariance matrix S x is biased in such a situation Analysis based on x (r)

i can

Bootstrap approach to inference and power analysis 99

successfully remove the bias in S x and recover the correct model structure More detailson the merits of robust approaches to CSAcan be found in Yuan and Bentler (1998) andYuan et al (2000)

5 ApplicationsOur purpose in this section is to compare the performance of TML TB and TSB using realdata We will show that by combining the bootstrap and a downweighting transforma-tion a nearly optimal procedure may be found for analysing a given data set If a teststatistic T is nearly pivotal then its evaluations Tb at the B0 bootstrap replicationsshould approximately follow a chi-square distribution There are a variety of tools toevaluate this property We favour the quantilendashquantile (QQ) plot because of itsvisualization value If T follows x 2

d the plot of ordered Tb s against the quantiles ofx 2

d should form an approximately straight line with slope2

p2 Asignicant departure

from this line indicates violations of the pivotal property To save space the QQplots arepresented in a document available on the Internet (httpwwwndedu kyuanpapersbsinfpowgpdf ) We choose B0 = B1 = B = 1000 in estimating ca and ba although a smaller B may be enough in obtaining these estimates (Efron amp Tibshirani1993)

Example 1 Holzinger and Swineford (1939) provide a data set consisting of 24cognitive variables on 145 subjects We use nine of the variables in our study visualperception cubes and lozenges measuring a spatial ability factor paragraph compre-hension sentence completion and word meaning measuring a verbal ability factoraddition counting dots and straight-curved capitals measuring (via the imposition of atime limit) a speed factor Let x represent the nine variables then the conrmatoryfactor model

x = L f + e Cov(x ) = LFL cent + W (17a)

with

L =

10 l 21 l 31 0 0 0 0 0 0

0 0 0 10 l 52 l 62 0 0 0

0 0 0 0 0 0 10 l 83 l 93

0

BB

1

CCA

cent

F =

f11 f12 f13

f 21 f22 f 23

f 31 f32 f 33

0

BB

1

CCA

(17b)

represents Holzinger and Swinefordrsquos hypothesis We assume that the measurementerrors are uncorrelated with W = Cov(e ) being a diagonal matrix There are q = 21unknown parameters and the model degrees of freedom are 24

Mardiarsquos (1970) multivariate kurtosis for the nine-variable model is 304 implyingthat the data may come from a distribution with heavier tails than those of a normaldistribution We therefore apply the downweighting transformation (15) in order forTML to be approximately pivotal Based on previous research (Yuan et al 20002001) we apply the three statistics to three samples the raw sample x i and thetransformed samples x (05)

i and x (25)i The following bootstrap procedures include

further transforming these samples to satisfy H0 or H1 as given in Section 2

100 Ke-Hai Yuan and Kentaro Hayashi

QQ plots for the three statistics on the raw data set suggest that no statistic isapproximately pivotal All have heavier right tails than that of x 2

24 This is expected forTML because the sample x i exhibits heavier tails than those of a normal distributionPrevious studies suggest that a large sample size is needed for TB to behave like achi-square random variable Asample size of n = 145 may not be large enough or theremay be other unknown factors that prevent TB from behaving like a chi-square variableEven though many simulation studies recommend TSB this statistic certainly has aheavier right tail than that of x 2

24 for this data setWhen applied to the transformed sample x (05)

i all the three statistics still haveheavier right tails than that of x 2

24 When applied to x (25)i the QQplots indicate that TML

is approximately described by x 224 but TB or TSB are not Consequently the suggested

procedure for this data set is to apply TML to the transformed sample x (25)i Boot-

strapping with TML on this sample not only leads to more efcient parameter estimates(Yuan et al 2001) but also provides a more accurate Type I error estimate

Table 1 gives the signicance levels of model (17) evaluated by various proceduresSeveral differences exist among these procedures First all the bootstrap p-values ( pB)

are greater than those ( px 2 ) obtained by referring the test statistics to x 224 The only

comparable pair of pB and px 2 is when TML is applied to the sample x (25)i Second the TB

statistic for each sample is the largest among the three statistics but the pB for TB is alsothe largest for any of the three samples This is in contrast to conclusions regarding theperformance of TB when referring to a chi-square distribution (Hu et al 1992 Fouladi2000 Yuan amp Bentler 1997) Third all the pBs based on any statistic with any of thesamples are quite comparable implying the robustness of bootstrap inference Theabove phenomena can be further explained by the critical values c 05 in Table 2 All thec 05 s are greater than x 2

24(95)plusmn 1 = 36415 the 95th percentile of x 224 When the heavier

tails are downweighted in the sample the T s corresponding to each of the threestatistics behave more like x 2

24 and the corresponding c 05 is nearer 36415 Although TBis the largest among the three statistics on anyof the samples the corresponding c 05 forTB is also the largest explaining whythe associated pBs for TB can also be the largest Onthe other hand the traditional inference procedure is to refer TB to x 2

24 for signicanceWith such a xed reference distribution that does not take the increased variabilityof TBinto account a larger TB generally corresponds to a smaller px 2

The power properties of TML TB and TSB are evaluated at two alternatives Startingwith model (17) the parameter l 91 is the only signicant path identied by theLagrange multiplier test in EQS (Bentler 1995) Adding this extra parameter to (17b)the rst alternative Sa1 is the estimated covariance matrix obtained by tting this model

Bootstrap approach to inference and power analysis 101

Table 1 Statistics bootstrap p-values ( pB) and p-values ( px 2 ) referring to x 224

Sample x i Sample x (05)i Sample x (25)

i

Statistic pB px 2 pB p x 2 pB p x 2

TML 51187 017 001 49408 010 002 48883 004 002TSB 48696 020 002 50318 013 001 53674 003 000TB 56726 055 000 62095 024 000 66910 014 000

to the raw sample x i by minimizing DML The second alternative Sa 2 is the samplecovariance matrix of the raw sample With these two alternatives the NCPs in (9) for thethree statistics are given in Table 3 Notice that once a Sa is given the d in (9)corresponding to TML is xed and does not depend on the sample Because both TSB andTB involve fourth-order sample moments their corresponding ds change as the tails ofthe data set change QQplots for the ordered T s against quantiles of the x 2

24(d) indicatethat the distributions of TML and TSB applied to x i are well approximated by thecorresponding non-central chi-squares Because TML under H0 applied to x (25)

i is wellapproximated by x 2

24 we would expect its distributions under the two alternatives alsoto be well approximated by x 2

24(2260) and x 224(5119) respectively This expectation is

not fullled as judged from the corresponding QQplots The reason for this is that evenfor perfectly normal data simulated from Np( m S) the statistic TML will not behave like anon-central chi-square unless d is quite small and n is large Because the NCP tends to beoverestimated when it is large (Satorra Saris amp de Pijper 1991) the corresponding QQplots are below the x = y line

Can we trust the results of power analysis in the traditional approach when TML andTSB are well approximated by non-central chi-square distributions in this example Theanswer is no This is because the critical value x 2

24(95)plusmn 1 = 36415 which does not takethe actual variabilityof TMLor TSB into account is not a good estimate of ca in (3) As canbe seen from Table 2 there exist quite substantial differences between x 2

24(95)plusmn 1 andthe c05 s Actually power analysis based on a non-central chi-square table is not reliablefor even perfectly normal data when S(v) is not near S a Table 4 contrasts the powerestimates based on the bootstrap (ba ) with those based on the traditional approach (ba)

using NCPs in Table 3 The ba s are uniformly smaller than the ba s especially when theNCPs are not huge When an NCP is large enough there is almost no overlap betweenthe distribution of (T | H0) and that of (T | H1) Any sensible inference procedure can tellthe difference between the two and consequently the power estimates under Ha2 inTable 4 are approximately the same

102 Ke-Hai Yuan and Kentaro Hayashi

Table 2 Bootstrap critical values c 05

Statistic Sample x i Sample x (05)i Sample x (25)

i

TML 42348 39991 37022TSB 41694 41067 40587TB 57930 55120 53925

Table 3 Non-centrality parameter estimates associated with each statistic and sample

Sample x i Sample x (05)i Sample x (25)

i

Statistic Ha1 Ha2 Ha1 Ha2 Ha1 Ha2

TML 22603 51187 22603 51187 22603 51187TSB 21763 48695 23107 51901 24875 55948TB 20289 56726 20130 63187 20683 69516

Table 4 illustrates the power properties of each statistic when the data changeWhen the outlying cases are downweighted not only do the parameter estimatesbecome more efcient (Yuan et al 2001) but also the powers for identifying anincorrect model are higher for all the statistics The powers of TML and TSB arequite comparable while the power of TB is much lower This is in contrast to theconclusion for power analysis based on the non-central chi-square where TB hasthe highest power in detecting a wrong model (Fouladi 2000 Yuan amp Bentler1997)

Next we focus on testing model (17) for close t We will only present results of thebootstrap using TML The use of TSB or TB is essentially the same MacCallum et al(1996) recommended testing close t (RMSEA 05) fair t (05 lt RMSEA 08)mediocre t (08 RMSEA 10) and poor t (10 lt RMSEA) We will conduct abootstrap procedure for these tests With (10) it is easy to see thatRMSEA = [minv D(S c S(v))( p plusmn q)]12 = e corresponds to dc = n ( p plusmn q)e2 Thed c s corresponding to e = 05 08 and 10 are given in the second row of Table 5The 95th percentiles of the corresponding x 2

24(dc ) are in the third row of the tableThese are respectively xed critical values for testing close t fair t and mediocre tin the approach of MacCallum et al (1996)

Bootstrap approach to inference and power analysis 103

Table 4 Bootstrap power estimates (ba ) and power estimates (ba) referring to x 24(d)

Sample x i Sample x (05)i Sample x (25)

i

Statistic Ha1 Ha2 Ha1 Ha2 Ha1 Ha2

TML ba 627 993 661 997 715 998ba 803 998 803 998 803 998

TSB ba 613 989 661 997 717 998ba 782 997 814 998 851 999

TB ba 374 983 454 991 501 995ba 743 999 739 1000 754 1000

Table 5 Test for close t

RMSEA 05 08 10dc 87 22272 3480

Sample TML x 224(dc 95)plusmn 1 48919 66926 82764

x i 51187 ca 53013 69928 84287p x 2 033 315 695pB 073 359 695

x (05)i 49408 ca 49548 55595 59620

p x 2 046 368 743pB 053 107 163

x (25)i 48883 ca 46305 51741 55899

p x 2 050 384 756pB 034 080 131

Let S x be the sample covariance matrix of x i and v be the corresponding maximumlikelihood estimate of v The values of h in the solution for Sc in the form ofS

h= (1 plusmn h)S(v) + hS x corresponding to e = 05 08 and 10 are respectively

0421 61 0669 81 and 0830 83 Applying the bootstrap procedure outlined in Section24 to the samples x i x (05)

i and x (25)i the estimates ca and p-values for each of the

samples are reported in Table 5 The results indicate that there are substantialdifferences between the traditional p-values px 2 and the bootstrap p-values pBespecially when RMSEA = 10 for the downweighted samples It is obvious that TMLwhen RMSEA = 10 has a much shorter right tail than that of x 2

24(dc) This fact is alsoreected in the corresponding ca s Table 5 once again illustrates the fact that given aclose t the behaviour of TML cannot be described by a non-central chi-square unlessthe corresponding TML under H0 in (1) has a heavy right tail As when testing for exactt using a chi-square table to judge the signicance of a statistic in testing model (17) forclose t is misleading

Example 2 Neumann (1994) presented an alcohol and psychological symptomdata set consisting of 10 variables and 335 cases The two variables in x = (x 1 x 2) cent arerespectively family history of psychopathology and family history of alcoholism whichare indicators for a latent construct of family history The eight variables iny = (y1 y8) cent are respectively the age of rst problem with alcohol age of rstdetoxication from alcohol alcohol severity score alcohol use inventory SCL-90psychological inventory the sum of the Minnesota Multiphasic Personality Inventoryscores the lowest level of psychosocial functioning during the past year and the highestlevel of psychosocial functioning during the past year With two indicators for eachlatent construct these eight variables respectively measure age of onset alcoholsymptoms psychopathology symptoms and global functioning Neumannrsquos (1994)theoretical model for this data set is

x = Lx y + d y = L y h + e (18a)

h = B h + Gy + z (18b)

where

Lx =10

l 1

Aacute Ly =

1 l 2 0 0 0 0 0 0

0 0 1 l 3 0 0 0 0

0 0 0 0 1 l 4 0 0

0 0 0 0 0 0 1 l 5

0

BBBBB

1

CCCCCA

cent

(18c)

B =

0 0 0 0b 21 0 0 0

b 31 b 32 0 00 b42 b43 0

0

BBB

1

CCCA G =

g11

0

00

0

BBB

1

CCCA F = Var(y) (18d )

and e d and z are vectors of errors whose elements are all uncorrelated The modeldegrees of freedom are 29

With Mardiarsquos multivariate kurtosis equal to 1476 the data set may come from adistribution with heavy tails Abootstrap procedure is more appropriate after the heavytails are properly downweighted Actually Yuan et al (2001) found that the samplex (10)

i leads to the most efcient parameter estimates in (18) among various proceduresHere we apply the three statistics to the two samples x i and x (10)

i Our purpose is to

104 Ke-Hai Yuan and Kentaro Hayashi

explore the pivotal property of TML TSB and TB on each sample After noticing that alarge portion of the QQ plot for TML applying to x (10)

i is below the x = y line ouranalysis also includes the sample x (05)

i The QQplots of the three statistics applied to x i indicate that none of them is nearly

pivotal TML and TSB are for the most part above the x = y line however both their lefttails are slightlybelow the x = y line This implies that some bootstrap samples t model(18) extremely well a phenomenon which can be caused by too many data points nearthe centre of the distribution Adownweighting procedure only affects data points thatcause a test statistic to possess a heavy right tail It is not clear to us how to deal with asample when a test statistic has a light left tail The QQplots of TML and TSB on x (05)

i andx (10)

i suggest that their heavier right tails on x i are under control However the right tailof TB is still quite heavy when compared to x 2

29 The downweighting transformationH(10) not only just controls the right tail of TML but also makes most of the QQplot fallbelow the x = y line Visually inspecting the QQ plots of TML on x i x (05)

i and x (10)i

suggests that a downweighting transformation by H(r) with 0 lt r lt 05 may lead toa better procedure for analysing the alcohol and psychological symptom data setbased on TML Since any procedure is only an approximation to the real world andTML is nearly pivotal when applying to x (05)

i we recommend the analysis using TMLon x (05)

i for this data set This leads to TML = 4243 with a bootstrap p-value of 040implying that the model (18) marginally ts the alcohol and psychological symptomdata set

Both the raw data sets in Examples 1 and 2 have signicant multivariate kurtoses andthe downweighting transformation (15) achieves approximately pivotal behaviour ofthe statistics We maywonder how these test statistics behave when applied to a data setthat does not have a signicant multivariate kurtosis This is demonstrated in thefollowing example

Example 3 Table 121 of Mardia Kent and Bibby (1979) contains test scores ofn = 88 students on p = 5 topics mechanics vectors algebra analysis and statisticsThe rst two topics were tested with closed-book exams and the last three withopen-book exams Since these two examination methods may tap different abilities atwo-factor model as in (17a) with

L =10 l 21 0 0 00 0 10 l 42 l 52

sup3 acutecent F =

f11 f12

f 21 f 22

sup3 acute(19)

was proposed and conrmed by Tanaka Watadani and Moon (1991) Mardiarsquos multi-variate kurtosis for this data set equal to 0057 is not signicant It would be interestingto see how TML TB and TSB perform on this data set

QQ plots for the three statistics applied to x i indicate that all of them have heavierright tails than that of x 2

4 The QQ plot for TML applied to x (r)i continues to exhibit a

heavier right tail until r reaches 30 TSB and TB still possess quite heavy right tails evenwhen r = 30

This data set has been used as an example for inuence analysis Previous studiesindicate that case number 81 is the most inuential point (Lee amp Wang 1996 Fung ampKwan 1995) So it is enticing to apply the three statistics to the x i s without the 81stcase The QQplots of TML TSB and TB applied to the remaining 87 cases indicate that allthe three statistics still have heavier right tails than that of x 2

4 Actually compared tothose based on all 88 cases the right tails of the three statistics on the 87 cases are evenheavier For this data set our recommended analysis is to use TML applied to x (30)

i With

Bootstrap approach to inference and power analysis 105

TML = 189 and a bootstrap p-value of 71 model (19) is more than good enough inexplaining the relationship of the ve variables

6 Non-convergence with Bootstrap ReplicationsNon-convergence issues exist with resampling and caution is needed for bootstrapinference (Ichikawa amp Konishi 1995) This generally happens when the sample size isnot large enough and especially when a model structure is wrong This issue wasdiscussed in Yuan and Bentler (1997) with Monte Carlo studies on a covariancestructure model Here we propose a reasonable way of handling non-convergencewith bootstrap replications

With an iterative algorithm like Newtonrsquos convergence is generally dened ask Dv ( j ) k lt e where e is a small number and Dv ( j ) = v ( j ) plusmn v ( j plusmn 1) with v ( j ) being thej th step solution Let s = vech(S) be the factor formed by stacking the non-duplicatedelements of S and let its sample counterpart be s = vech(S) Then Dv ( j + 1) =

( Ccedils centj Wj Ccedilsj)plusmn 1 Ccedils centj Wj (s plusmn sj) where Ccedilsj = [ds(v)dv | v ( j )] sj = s(v ( j )) and Wj is the

corresponding weight matrix evaluated at v ( j ) So Dv ( j ) is proportional toS plusmn S(v ( j plusmn 1)) and it is impossible for k Dv ( j ) k to be smaller than e if S(v) is far fromE(S) Although a model is correct in a bootstrap population F0 some bootstrapreplications may still lie far away from the model especially when sampling errors arelarge for small samples For these samples even if we can obtain a solution using somenon-iterative or direct search method the corresponding statistic will be signicantBased on this fact we should distinguish two kinds of non-convergence in bootstrap orgeneral Monte Carlo studies The rst is where a sample contains enough distinct pointsand still cannot reach convergence with a model which should be treated as asignicant replication or a lsquobad modelrsquo The second is where a sample does not containenough distinct observations to t a model For obtaining a TML this number is p + 1 thenumber is p + 1 for obtaining a TB Although a TSB can be obtained once a TML isavailable we need to have a positive denite sample covariance matrix S y ofy i = vech[(x plusmn x )(x plusmn x ) cent ] in order for TSB to make sense (Bentler amp Yuan 1999)and the minimum number of distinct data points for S y to be positive denite is p + 1We should treat the second case as a bad sample and ignore it in bootstrap replications

In practice instead of estimating ca one generally reports the p-value of a statistic TIf all B bootstrap replications result in convergent solutions

pB =B0 + 1 plusmn M

B0 + 1

where M = fTb | Tb gt Tg When non-convergences occur B0 should be dened as thenumber of converged samples plus the number of signicant samples due to a badmodel M should be dened as the number of non-converged samples due to the badmodel plus the number of converged samples that result in Tb gt T With this modi-cation there is no problem with hypothesis testing A similar modication applies toformula (8) for power evaluation With ca = T(B0 (1 plusmn a)) in determining the numeratorin (8) one needs at least B0(1 plusmn a) converged samples when sampling from F0 The proposed procedure fails to generate a power estimate when the number ofconvergences under H0 is below B0(1 plusmn a)

Let l 1(S x ) $ l 2(S x ) $ $ l p(S x ) be the eigenvalues of the sample covariancematrix S x and g1(S y ) $ g 2(S y ) $ $ gp (S y ) be the eigenvalues of the sample

106 Ke-Hai Yuan and Kentaro Hayashi

covariance matrix S y of y i = vech[(x plusmn x )(x plusmn x ) cent ] In computing the three examplesour criteria for bad samples are l p(S x )l 1(S x ) 10plusmn 20 for obtaining TML andgp (S y )g1(S y ) 10plusmn 20 for obtaining TSB and TB If these criteria are satised and themodel still cannot reach convergence (k Dv k lt 10plusmn 5) in 100 iterations we treat thecorresponding statistic as innity All non-convergences with the three examples weredue to signicant replications these are reported in Table 6 Table 6(a) implies that for agiven sample the further a model is from H0 the more often non-convergences occurThe table also suggests that with a given model the heavier the tails of a data set themore often one obtains non-convergences

7 Discussion and conclusionsExisting conclusions regarding the three commonly used statistics TML TSB and TB arebased on asymptotics and Monte Carlo studies The properties inherent in either ofthese approaches may not be enjoyed by these statistics in practice because of either anite sample or an unknown sampling distribution By applying them to a specic data

Bootstrap approach to inference and power analysis 107

Table 6 Non-convergences due to signicant samples

(a) Example 1

Sample x i Sample x (05)i Sample x (25)

i

Statistic H0 Ha1 Ha2 H0 Ha1 Ha2 H0 Ha1 Ha2

TML 0 0 2 0 0 1 0 0 1TSB 0 0 2 0 0 1 0 0 1TB 0 10 21 0 10 27 0 7 20

(b) Example 2

Sample x i Sample x (05)i Sample x (10)

iStatistic H0 H0 H0

TML 9 1 1TSB 9 1 1TB 1 0 0

(c) Example 3

Sample x i Sample x (30)i Sample x i

(81st case removed)Statistic H0 H0 H0

TML 1 0 0TSB 1 0 0TB 0 0 0

set through resampling properties of the three statistics can be visually examined bymeans of QQplots When a data set possesses heavy tails TML will inherit these tails byhaving a heavy right tail Actually all three statistics need the sampling distribution tohave nite fourth-order moments With possible violation of this assumption bypractical data we propose applying a bootstrap procedure to a transformed samplex (r)

i through downweighting The combination of bootstrapping and downweightingnot only offers a theoretical justication for applying the bootstrap to a data set withheavy tails but also provides a quite exible tool for exploring the properties of each ofthe three statistics Even if inference is based on referring a statistic to a chi-squaredistribution one will obtain a more accurate model evaluation by applying TML to atransformed sample x (r)

i Asymptotics justify the three statistics from different perspectives and TB is

asymptotically distribution-free Previous Monte Carlo studies mainly support TSBHowever with proper downweighting TML is generally the one that is best describedby a chi-square distribution However the conclusion that TML is the best statistic forbootstrap inference has to be preliminary Future studies may nd TB or TSB moreappropriate for other data sets We recommend exploring the different procedures for agiven data set as illustrated in Section 5

AcknowledgementsThe authors would like to thank Professor Peter M Bentler and two anonymous referees whoseconstructive comments lead to an improved version of this paper Correspondence concerningthis article should be addressed to Ke-Hai Yuan (kyuanndedu)

ReferencesAmemiya Y amp Anderson T W (1990) Asymptotic chi-square tests for a large class of factor

analysis models Annals of Statistics 18 1453ndash1463Barndorff-Nielsen O E amp Cox D R (1984) Bartlett adjustments to the likelihood ratio statistic

and the distribution of the maximum likelihood estimator Journal of the Royal StatisticalSociety B 46 483ndash495

Bentler P M (1995) EQS Structural equations program manual Encino CA MultivariateSoftware

Bentler P M amp Yuan K-H (1999) Structural equation modeling with small samples Teststatistics Multivariate Behavioral Research 34 181ndash197

Beran R (1986) Simulated power functions Annals of Statistics 14 151ndash173Beran R (1988) Prepivoting test statistics Abootstrap view of asymptotic renements Journal

of the American Statistical Association 83 687ndash697Beran R amp Srivastava M S (1985) Bootstrap tests and condence regions for functions of a

covariance matrix Annals of Statistics 13 95ndash115Bollen K A amp Stine R (1993) Bootstrapping goodness of t measures in structural equation

models In K A BOllen and J S Long (Eds) Testing structural equation models(pp 111ndash135) Newbury Park CA Sage

Browne M W (1984) Asymptotic distribution-free methods for the analysis of covariancestructures British Journal of Mathematical and Statistical Psychology 37 62ndash83

Browne M W amp Shapiro A (1988) Robustness of normal theory methods in the analysis oflinear latent variate models British Journal of Mathematical and Statistical Psychology 41193ndash208

108 Ke-Hai Yuan and Kentaro Hayashi

Cohen J (1988) Statistical power analysis for the behavioral sciences (2nd ed) Hillsdale NJErlbaum

Davison A C amp Hinkley D V (1997) Bootstrap methods and their application CambridgeCambridge University Press

Efron B ampTibshirani R J (1993) An introduction to the bootstrap New York Chapman ampHallFouladi R T (1998) Covariance structure analysis techniques under conditions of multi-

variate normality and nonnormality Modied and bootstrap based test statistics Paperpresented at the annual meeting of the American Educational Research Association San DiegoCA

Fouladi R T (2000) Performance of modied test statistics in covariance and correlationstructure analysis under conditions of multivariate nonnormality Structural EquationModeling 7 356ndash410

Fung W K ampKwan C W (1995) Sensitivityanalysis in factor analysis Difference between usingcovariance and correlation matrices Psychometrika 60 607ndash614

Hall P (1992) The bootstrap and Edgeworth expansion New York Springer-VerlagHolzinger K J amp Swineford F (1939) A study in factor analysis The stability of a bi-factor

solution University of Chicago Press Supplementary Educational Monographs No 48Hu L T Bentler P M amp Kano Y (1992) Can test statistics in covariance structure analysis be

trusted Psychological Bulletin 112 351ndash362Huber P J (1981) Robust statistics New York WileyIchikawa M amp Konishi S (1995) Application of the bootstrap methods in factor analysis

Psychometrika 60 77ndash93Lawley D N amp Maxwell A E (1971) Factor analysis as a statistical method (2nd ed) New

York American ElsevierLee S Y ampWang S J (1996) Sensitivityanalysis of structural equation models Psychometrika

61 93ndash108MacCallum R C Browne M W ampSugawara H M (1996) Power analysis and determination of

sample size for covariance structure modeling Psychological Methods 1 130ndash149Mardia K V (1970) Measure of multivariate skewness and kurtosis with applications

Biometrika 57 519ndash530Mardia K V Kent J T amp Bibby J M (1979) Multivariate analysis New York Academic PressMaronna R A (1976) Robust M-estimators of multivariate location and scatter Annals of

Statistics 4 51ndash67Mooijaart A ampBentler P M (1991) Robustness of normal theory statistics in structural equation

models Statistica Neerlandica 45 159ndash171Neumann C S (1994) Structural equation modeling of symptoms of alcoholism and

psychopathology Doctoral dissertation University of Kansas LawrenceSatorra A amp Bentler P M (1988) Scaling corrections for chi-square statistics in covariance

structure analysis In 1988 Proceedings of Business and Economics Sections (pp 308ndash313)Alexandria VA American Statistical Association

Satorra A amp Bentler P M (1990) Model conditions for asymptotic robustness in the analysis oflinear relations Computational Statistics amp Data Analysis 10 235ndash249

Satorra A ampSaris W E (1985) Power of the likelihood ratio test in covariance structure analysisPsychometrika 50 83ndash90

Satorra A Saris W E amp de Pijper W M (1991) Acomparison of several approximations to thepower function of the likelihood ratio test in covariance structure analysis StatisticaNeerlandica 45 173ndash185

Shapiro A (1987) Robustness properties of the MDF analysis of moment structures SouthAfrican Statistical Journal 21 39ndash62

Steiger J H amp Lind J M (1980) Statistically based tests for the number of common factorsPaper presented at the annual meeting of the Psychometric Society Iowa City

Steiger J H Shapiro A amp Browne M W (1985) On the multivariate asymptotic distribution ofsequential chi-square statistics Psychometrika 50 253ndash264

Bootstrap approach to inference and power analysis 109

Tanaka Y Watadani S amp Moon S H (1991) Inuence in covariance structure analysis with anapplication to conrmatory factor analysis Communications in Statistics Theory andMethods 20 3805ndash3821

Tyler D E (1983) Robustness and efciency properties of scatter matrices Biometrika 70411ndash420

Wakaki H Eguchi S amp Fujikoshi Y (1990) A class of tests for a general covariance structureJournal of Multivariate Analysis 32 313ndash325

Yuan K-H amp Bentler P M (1997) Mean and covariance structure analysis Theoretical andpractical improvements Journal of the American Statistical Association 92 767ndash774

Yuan K-H amp Bentler P M (1998) Structural equation modeling with robust covariancesIn A E Raftery (ed) Sociological methodology 1998 (pp 363ndash396) Boston BlackwellPublishers

Yuan K-H amp Bentler P M (1999) On normal theory and associated test statistics in covariancestructure analysis under two classes of nonnormal distributions Statistica Sinica 9 831ndash853

Yuan K-H Chan W amp Bentler P M (2000) Robust transformation with applications tostructural equation modelling British Journal of Mathematical and Statistical Psychology53 31ndash50

Yuan K-H Bentler P M amp Chan W (2001) Structural equation modeling with heavy taileddistributions Manuscript submitted for publication

Yuan K-H amp Hayashi K (2001) On using an empirical Bayes covariance matrix in bootstrapapproach to covariance structure analysis Manuscript submitted for publication

Yung Y F amp Bentler P M (1994) Bootstrap-corrected ADF test statistics in covariance structureanalysis British Journal of Mathematical and Statistical Psychology 47 63ndash84

Yung Y F amp Bentler P M (1996) Bootstrapping techniques in analysis of mean and covariancestructures In G A Marcoulides and R E Schumacker (Eds) Advanced structural equationmodeling Techniques and issues (pp 195ndash226) Hillsdale NJ Lawrence Erlbaum

Zhang J amp Boos D D (1992) Bootstrap critical values for testing homogeneity of covariancematrices Journal of the American Statistical Association 87 425ndash429

Received 9 November 2000 revised version received 22 March 2002

110 Ke-Hai Yuan and Kentaro Hayashi

Page 2: Bootstrap approach to inference and power analysis based on three test statistics for covariance structure models

can be well described by a chi-square distribution even for simulated normal data withsmall to medium sample sizes (Bentler amp Yuan 1999)

Bootstrap methods provide a competitive alternative for statistical inference underviolations of standard regularity conditions (Efron amp Tibshirani 1993 Davison ampHinkley 1997) Bootstrap testing in CSA was developed by Beran and Srivastava(1985) Bollen and Stine (1993) and Yung and Bentler (1994 1996) further showedhow to obtain correct Type I errors with TML and TB Zhang and Boos (1992) used thebootstrap to test homogeneity of covariance matrices from multiple samples Simulationstudies by Fouladi (1998) indicate that bootstrap inference with TML has the best overallperformance in controlling Type I errors

Although the bootstrap has shown promise for CSA various unknowns related tothis procedure still exist when it is applied to the three commonly used statistics TMLTSB and TB For example bootstrap theory requires the underlying sampling distributionto have nite fourth-order moments With practical data possessing large empiricalkurtosis how can the bootstrap be safely applied in CSA When using a bootstrapmethod are TML TB and TSB equivalent If not which statistic should one chooseWhen inference is based on asymptotics the existing literature indicates that TB has thebest power (Fouladi 2000 Yuan amp Bentler 1997) What are the power properties of thethree statistics when they are used with the bootstrap With a bootstrap method howshould one determine the sample size needed to achieve a certain power in CSA Howshould one deal with non-convergence problems in bootstrap replications

Our purpose in this paper is to address the above issues associated with CSA InSection 2 we provide details of a power analysis via the bootstrap including sample-sizedetermination and a test for close t In Section 3 we compare the pivotal properties ofthe three statistics for bootstrap inference We also discuss limitations of the pivotalproperties to be realized with practical data In Section 4 we propose a downweightingapproach that can be applied with the bootstrap to distributions with heavy tails As willbe shown by means of examples in Section 5 for data having heavy tails using thedownweighting procedure with the bootstrap not only makes the bootstrap procedurevalid but also leads to more accurate statistical inference Examples in Section 5 alsoillustrate how to nd a nearly optimal procedure when using the bootstrap forinference Through the examples we also compare the power properties of thethree test statistics In Section 6 we provide a reasonable way of dealing withnon-convergence in bootstrap replications The paper concludes with a brief discussion

2 Bootstrap approaches to inference and powerBeran (1986) studied the power properties of the bootstrap based on a general statisticwhose distribution depends on nuisance parameters Our development of inference andpower analysis can be regarded as an application of Beranrsquos general theory of resamplingto CSA

21 Model testLet Fx be the cumulative distribution function (cdf ) of a p-dimensional population fromwhich a sample x 1 x n has been drawn Let Fx be the corresponding empiricaldistribution function (edf ) dened by the sample x i s Without loss of generality weassume E(x i) = m and Cov(x i) = S For a covariance structure S(v) an inference

94 Ke-Hai Yuan and Kentaro Hayashi

procedure is to nd whether there exists a v0 such that the null hypothesis

H0 S = S(v0) (1)

holds Agood testing procedure should accept H0 when it is true When an alternativehypothesis S = S a is true such that

H1 S a THORN S(v) for any admissible v (2)

it should reject (1)In order to test (1) with a statistic T = T(x 1 x n) we need a critical value ca

such that

P(T gt ca | H0) = a (3)

Without knowing Fx it is impossible for us to nd ca when the probability in (3) ismeasured by Fx However we can dene a ca such that

P(T gt ca | F0) = a (4)

where F0 is an edf with a covariance matrix satisfying (1) In bootstrap testing for (1) ca

is estimated through resampling from F0 Let

y i = S12 Splusmn 12x x i i = 1 n (5)

where S = S(v) for an admissible estimate v (Beran amp Srivastava 1985) It is obvious thatthe edfs Fx and Fy have the same type of distribution but differ in lsquolocation and scalersquoActually the only purpose of (5) is to create an empirical distribution F0 = Fy havingthe same distributional type as that of Fx and with a covariance matrix satisfying H0

Let b = (y (b)1 y (b)

n ) be a random sample of size n from F0 and Tb be thecorresponding statistic of T evaluated at this sample With independent samples b b = 1 B0 we rearrange the Tb in order and denote them by

T(1) T(2) T(B0 )

The bootstrap estimate ca of ca is the B0(1 plusmn a) th quantile T(B0 (1 plusmn a)) and we reject thenull hypothesis (1) when T gt ca Let a be the exact level of a test dened in (3) and a

be the corresponding level when ca is replaced by ca Under quite general conditionsBeran (1986) established the consistency of a for a These conditions include thefourth-order moments of Fx being nite

22 Power evaluationThe bootstrap can also be used to evaluate the power ba of a statistic T dened by

ba = P(T gt ca | H1) (6)

Let Sa be the covariance matrix in (2) and

z i = S 12a Splusmn 12

x x i (7)

Then Fa = Fz represents the bootstrap population under H1 For b = 1 2 B1 letZb = (z (b)

1 z (b)n ) be random samples of size n from Fa and Tb be the corresponding

statistic of T evaluated at Zb Then the bootstrap estimate of ba in (6) is

ba = fTb gt cag B1 (8)

where ca is the bootstrap estimate of the ca in (4) Under fairlygeneral conditions Beran(1986) established the consistencyof ba for ba as n approaches innity This consistency

Bootstrap approach to inference and power analysis 95

should be understood in the following sense when S a is the population covariancematrix of Fx that generates the observed sample x 1 x n then ba in (7) is consistentfor the probability ba in (6) and the latter is the exact power measured by the cdf Fx

In contrast to (5) the S a in (7) cannot be replaced by a consistent estimate Forexample we will not obtain a consistent estimate of power for T to reject the nullhypothesis H0 under the true covariance matrix S a = Cov(x i) when replacing S a by thesample covariance matrix S x of the x i s The inconsistency here can be well understoodthrough TML Actually with S a being O(1

n

p) apart from S0 the bootstrap approach to

power is asymptotically equivalent to the approach developed in Satorra and Saris(1985) for normally distributed data (see also Steiger Shapiro amp Browne 1985) Theseauthors used a non-central chi-square distribution to describe the behaviour of TMLunder H1 with a non-centrality parameter (NCP)

d = n minv

D(S a S(v)) (9)

where D = DML is the Wishart likelihood discrepancy function When replacing S a by aconsistent estimate S a even though S a = Sa + Op(1

n

p) the NCP estimate based on

Sa can be Op(1) apart from that based on S a This limitation of power evaluation alsoexists in other classical procedures for power analysis (eg Cohen 1988) The fact isthat without knowledge of the true effect size one cannot consistently estimate thepower of a statistic in rejecting the null

Although the bootstrap approach to power analysis is equivalent to the approachproposed by Satorra and Saris (1985) for normally distributed data when S a is O(1

n

p)

apart from S(v) there are fundamental differences between the two approaches Withthe bootstrap approach there is neither a central chi-square distribution when H0 holdsnor a non-central chi-square distribution when H1 holds Instead the distribution of T isjudged by its empirical behaviour based on resampling from observed data We mayregard l = minv D(Sa S(v)) as a measure of effect size For a given sample size thegreater l is the higher the power for a bootstrap test to distinguish between H0 and H1Because there is no chi-square distribution assumption with the bootstrap approachpower analysis cannot resort to a chi-square table and has to be evaluated separately fordifferent samples This tailor-made type of analysis can be regarded as an advantage ofthe bootstrap methodology

23 Determination of sample sizeLet x 1 x n be a pilot sample Under H1 in (2) we want to nd the smallest samplesize for a statistic T to reject H0 with probability b0 For this purpose we rst draw B0independent samples b (m) = (y (b)

1 y (b)m ) of size m from F0 to estimate the critical

value ca(m) as in Section 21 We then draw B1 independent samples Zb (m) Fz of sizem where the covariance matrix of z Fz is S a Evaluating Tb (m) at each sampleZb (m) as in (8) the estimated power for sample size m is

ba(m) = fTb (m) gt ca (m)g B1

Asmaller sample size m1 is needed if ba(m) gt b0 otherwise a greater sample size m 2 isneeded Finding the minimum m such that ba (m) $ b0 may require a series of trialsThe interval-halving procedure in the Appendix of MacCallum Browne and Sugawara(1996) who studied sample-size determination using non-central chi-squares can beequally applied here More discussion on the bootstrap approach to sample size and

96 Ke-Hai Yuan and Kentaro Hayashi

power can be found in Beran (1986) Efron and Tibshirani (1993 Chapter 25) andDavison and Hinkley (1997 Section 46)

24 Test for close tA covariance structure model is at best only an approximation to the real world Anyinteresting model will be rejected when the sample size is large enough Based on theroot-mean-square error of approximation (RMSEA) (Steiger amp Lind 1980) MacCallumet al (1996) proposed testing for close t rather than exact t in (1) Using a non-centralchi-square to describe the behaviour of TML their approach is equivalent to testing if theNCP is less than a prespecied value The test for close t can also be easilyimplemented with the bootstrap Let S c be a covariance matrix with close t such that

dc = n minv

D(Sc S(v)) (10)

where dc is the lsquonon-centrality parameterrsquo in equation (8) of MacCallum et al (1996)Such a covariance matrix can be found in the form of Sh = hS x + (1 plusmn h)S(v) WhenD = DML is the Wishart likelihood discrepancy function Yuan and Hayashi(2001) showed that minv D(Sh S(v)) = D(Sh S(v)) is a strictly increasing function ofh [ [0 1] and it is straightforward to nd a S c once a dc is given Let y i = S 12

c Splusmn 12x x i

i = 1 n then the covariance matrix of y Fy is S c Let Tb b = 1 B0 be thecorresponding statistics of T evaluated respectivelyat B0 independent samples from Fy Then ca = T(B0 (1 plusmn a)) is the critical value estimate of the test for close t and the modelwill be rejected when T = T(x 1 x n) gt ca It is also straightforward to performpower analysis with the bootstrap when a less-close-t covariance matrix describes thepopulation

3 Pivotal property of TML TSB and TB

Although it is unclear which of the statistics TML TSB and TB is best for bootstrappingwith a specic data set there is a general result for choosing a statistic (Beran 1988Hall 1992) In the following we will discuss this result and its relevance to CSA

A statistic is pivotal if its distribution does not depend on any parameters in theunderlying sampling distribution and it is asymptotically pivotal if its asymptoticdistribution does not depend on unknown parameters Let a statistic T = f (S x ) be asmooth function of the sample covariance matrix S x of x 1 x n Suppose T isasymptotically pivotal and an Edgeworth expansion applies to its distribution functionunder H0 (Barndorff-Nielsen amp Cox 1984 Wakaki Eguchi amp Fujikoshi 1990)

P(T t | F0) = G(t) + n plusmn 1g(t) + O(nplusmn 32) (11a)

where G(t) is the asymptotic distribution function of T and g(t ) is a smooth functionthat depends on some unknown population parameters As discussed in Hall (1992) andDavision and Hinkley (1997 Section 261) (11a) generally holds for smooth functionsof sample moments Let T be the corresponding statistic based on resampling from thecorresponding F0 Its Edgeworth expansion is

P(T lt t | F0) = G(t) + n plusmn 1 g(t ) + Op(nplusmn 32) (11b)

Since g(t) = g(t ) + Op(nplusmn 12) in general

P(T t | F0) plusmn P(T t | F0) = Op(nplusmn 32) (12)

Bootstrap approach to inference and power analysis 97

Suppose we know G then asymptotic inference is based on comparing the observedstatistic T = T(x 1 x n) to the critical value Gplusmn 1(1 plusmn a) Equation (11a) implies thatasymptotic inference can achieve a designated level a to the order of accuracy ofOp(1n) According to (12) bootstrap inference based on an asymptotically pivotalstatistic can achieve the level a to a higher order of accuracy When T is notasymptotically pivotal G(t) in (11) depends on some unknown parameters One hasto estimate G(t) by G(t) for asymptotic inference Similarly with the bootstrap one hasto implicitlyestimate the unknown parameters in G(t ) and consequently it achieves thesame order of accuracy to the level a as that based on G(t) So for more accurateinference we should choose a statistic that is as nearly pivotal as possible Beran (1988)gave a nice discussion on using a pivotal statistic for bootstrap and the order of accuracyfor Type I errors

When data are normallydistributed (TML| H0) is asymptoticallydistributed as x 2p plusmn q

where p = p( p + 1)2 and q is the number of unknown parameters in S(v) Conse-quently TML is asymptotically pivotal (Davison amp Hinkley 1997 p 139) More generalconditions also exist for TML to be asymptotically pivotal with some specic models(Amemiya amp Anderson 1990 Browne amp Shapiro 1988 Mooijaart amp Bentler 1991Satorra ampBentler 1990 Shapiro 1987 Yuan ampBentler 1999) Unfortunately there is noeffective way of verifying these conditions in practice The statistic TSB is obtained byrescaling TML using fourth-order sample moments (Satorra amp Bentler 1988) Within theclass of elliptical or pseudo-elliptical distributions with nite fourth-order moments(TSB | H0) approaches x 2

p plusmn q (Yuan amp Bentler 1999) So TSB is asymptotically pivotalwhen the sampling distribution is elliptical or pseudo-elliptical It is not asymptoti-cally pivotal for other types of non-normal distribution In contrast to TML and TSB TBis asymptotically pivotal for any sampling distribution with nite fourth-ordermoments

The above discussion may imply that TB is the preferred statistic for bootstrapinference in CSA However there exist conditions that may hide the pivotal property ofTB with nite samples In addition to the asymptotically pivotal property Davison andHinkley (1997 Section 251) discussed nearly pivotal properties of a statistic T whichrequires the distribution of T to approximately follow the same distribution whensampling from Fx or Fx Since the target function G(t) for the three test statistics is thecdf of x 2

p plusmn q the better a statistic is approximated by a chi-square distribution the moreaccurate its bootstrap inference is in controlling Type I errors Because sample size is aserious issue with CSA a statistic that is asymptoticallypivotal may not be nearlypivotalas will be shown in Section 5

4 Heavy tails with practical dataBootstrap inference based on TML TB or TSB requires Fx to have nite fourth-ordermoments Of course the sample fourth-order moments of any sample will always benite However if some of the fourth-order moments (eg kurtoses) are quite large thecorresponding population fourth-order moments may actuallybe innite Even when allthe population fourth-order moments are nite inference based on the bootstrap willnot be accurate when Fx possesses heavy tails Here we propose applying the bootstrapprocedure to a sample in which cases contributing to heavy tails are properly down-weighted Following the study in Yuan Chan and Bentler (2000) we only use theHuber-type weights (Huber 1981 Tyler 1983) in controlling those cases

98 Ke-Hai Yuan and Kentaro Hayashi

For a p-variate sample x 1 x n let

d i = d (x i m S) = [(x i plusmn m ) cent Splusmn 1(x i plusmn m )]12

be the Mahalanobis distance and u 1(t ) and u 2(t) be non-negative scalar functions Arobust M-estimator ( ˆm S) can be obtained by solving (Maronna 1976)

m =Xn

i = 1

u 1(di )x i

iquest Xn

i = 1

u1(di ) (13a)

S =Xn

i = 1

u 2(d 2i )(x i plusmn m )(x i plusmn m ) cent n (13b)

With decreasing functions u 1(t) and u 2(t) cases having greater di s receive smallerweights and thus their effects are controlled Let r represent the proportion ofoutlying cases one wants to control and r be a constant dened by P(x 2

p gt r 2) = rThe Huber-type weights corresponding to this r are

u 1(d ) =1 if d r

rd if d gt r

raquo(14)

and u 2(d 2) = fu 1(d )g 2J where J is a constant such that Efx 2p u 2(x 2

p)g = p whichmakes the estimate S unbiased if Fx = Np( m S) We may regard r as a tuning parameterwhich can be adjusted according to the tails of a specic data set Applying differenttypes of weights to several data sets Yuan Bentler and Chan (2001) found that the mostefcient parameter estimates often go with Huber-type weights

Let ( ˆm S) be the converged solution to (13) u2 i = u 2fd 2(x i ˆm S)g and

x (r)i = f

u 2 i

p(x i plusmn ˆm )g (15)

Then we can rewrite (13b) as

S =Xn

i = 1

x (r)i x (r)

i cent n

which is just the sample covariance matrix of the x (r)i Yuan et al (2000) proposed using

(15) as a downweighting transformation formula Working with several practical datasets they found that the transformed sample x (r)

i is much better approximated by anormal distribution than the original sample x i As we shall see when applying thebootstrap to a sample x (r)

i the test statistic TML is approximately pivotal For x (r)i we

have

E(x (r)i 1 x (r)

i 2 x (r)i 3 x (r)

i 4 ) = E[u 22(d 2

i )(x i 1 plusmn m1)(x i 2 plusmn m2)(x i 3 plusmn m 3)(x i 4 plusmn m4)] + o(1) (16)

where x (r)i j is the j th element of x (r)

i Because the denominator of u 2(d 2i ) in (14)

contains d 2i when d 2

i gt r 2 (16) is bounded Even when the fourth-order moments of Fx

do not exist the corresponding fourth-order moments of the x (r)i are still nite

When interest lies in the structure of the population covariance matrix S = Cov(x i)one needs to know whether transformation (15) changes the structure of S Let S (r) bethe population covariance matrix of x (r)

i then S (r) = kS when the sampling distribu-tion is elliptically symmetric For almost all the commonly used models in the socialsciences modelling S is equivalent to modelling S (r) (Browne 1984) Yuan et al (2000)further illustrated that outlying cases create signicant multivariate skewness and thatthe sample covariance matrix S x is biased in such a situation Analysis based on x (r)

i can

Bootstrap approach to inference and power analysis 99

successfully remove the bias in S x and recover the correct model structure More detailson the merits of robust approaches to CSAcan be found in Yuan and Bentler (1998) andYuan et al (2000)

5 ApplicationsOur purpose in this section is to compare the performance of TML TB and TSB using realdata We will show that by combining the bootstrap and a downweighting transforma-tion a nearly optimal procedure may be found for analysing a given data set If a teststatistic T is nearly pivotal then its evaluations Tb at the B0 bootstrap replicationsshould approximately follow a chi-square distribution There are a variety of tools toevaluate this property We favour the quantilendashquantile (QQ) plot because of itsvisualization value If T follows x 2

d the plot of ordered Tb s against the quantiles ofx 2

d should form an approximately straight line with slope2

p2 Asignicant departure

from this line indicates violations of the pivotal property To save space the QQplots arepresented in a document available on the Internet (httpwwwndedu kyuanpapersbsinfpowgpdf ) We choose B0 = B1 = B = 1000 in estimating ca and ba although a smaller B may be enough in obtaining these estimates (Efron amp Tibshirani1993)

Example 1 Holzinger and Swineford (1939) provide a data set consisting of 24cognitive variables on 145 subjects We use nine of the variables in our study visualperception cubes and lozenges measuring a spatial ability factor paragraph compre-hension sentence completion and word meaning measuring a verbal ability factoraddition counting dots and straight-curved capitals measuring (via the imposition of atime limit) a speed factor Let x represent the nine variables then the conrmatoryfactor model

x = L f + e Cov(x ) = LFL cent + W (17a)

with

L =

10 l 21 l 31 0 0 0 0 0 0

0 0 0 10 l 52 l 62 0 0 0

0 0 0 0 0 0 10 l 83 l 93

0

BB

1

CCA

cent

F =

f11 f12 f13

f 21 f22 f 23

f 31 f32 f 33

0

BB

1

CCA

(17b)

represents Holzinger and Swinefordrsquos hypothesis We assume that the measurementerrors are uncorrelated with W = Cov(e ) being a diagonal matrix There are q = 21unknown parameters and the model degrees of freedom are 24

Mardiarsquos (1970) multivariate kurtosis for the nine-variable model is 304 implyingthat the data may come from a distribution with heavier tails than those of a normaldistribution We therefore apply the downweighting transformation (15) in order forTML to be approximately pivotal Based on previous research (Yuan et al 20002001) we apply the three statistics to three samples the raw sample x i and thetransformed samples x (05)

i and x (25)i The following bootstrap procedures include

further transforming these samples to satisfy H0 or H1 as given in Section 2

100 Ke-Hai Yuan and Kentaro Hayashi

QQ plots for the three statistics on the raw data set suggest that no statistic isapproximately pivotal All have heavier right tails than that of x 2

24 This is expected forTML because the sample x i exhibits heavier tails than those of a normal distributionPrevious studies suggest that a large sample size is needed for TB to behave like achi-square random variable Asample size of n = 145 may not be large enough or theremay be other unknown factors that prevent TB from behaving like a chi-square variableEven though many simulation studies recommend TSB this statistic certainly has aheavier right tail than that of x 2

24 for this data setWhen applied to the transformed sample x (05)

i all the three statistics still haveheavier right tails than that of x 2

24 When applied to x (25)i the QQplots indicate that TML

is approximately described by x 224 but TB or TSB are not Consequently the suggested

procedure for this data set is to apply TML to the transformed sample x (25)i Boot-

strapping with TML on this sample not only leads to more efcient parameter estimates(Yuan et al 2001) but also provides a more accurate Type I error estimate

Table 1 gives the signicance levels of model (17) evaluated by various proceduresSeveral differences exist among these procedures First all the bootstrap p-values ( pB)

are greater than those ( px 2 ) obtained by referring the test statistics to x 224 The only

comparable pair of pB and px 2 is when TML is applied to the sample x (25)i Second the TB

statistic for each sample is the largest among the three statistics but the pB for TB is alsothe largest for any of the three samples This is in contrast to conclusions regarding theperformance of TB when referring to a chi-square distribution (Hu et al 1992 Fouladi2000 Yuan amp Bentler 1997) Third all the pBs based on any statistic with any of thesamples are quite comparable implying the robustness of bootstrap inference Theabove phenomena can be further explained by the critical values c 05 in Table 2 All thec 05 s are greater than x 2

24(95)plusmn 1 = 36415 the 95th percentile of x 224 When the heavier

tails are downweighted in the sample the T s corresponding to each of the threestatistics behave more like x 2

24 and the corresponding c 05 is nearer 36415 Although TBis the largest among the three statistics on anyof the samples the corresponding c 05 forTB is also the largest explaining whythe associated pBs for TB can also be the largest Onthe other hand the traditional inference procedure is to refer TB to x 2

24 for signicanceWith such a xed reference distribution that does not take the increased variabilityof TBinto account a larger TB generally corresponds to a smaller px 2

The power properties of TML TB and TSB are evaluated at two alternatives Startingwith model (17) the parameter l 91 is the only signicant path identied by theLagrange multiplier test in EQS (Bentler 1995) Adding this extra parameter to (17b)the rst alternative Sa1 is the estimated covariance matrix obtained by tting this model

Bootstrap approach to inference and power analysis 101

Table 1 Statistics bootstrap p-values ( pB) and p-values ( px 2 ) referring to x 224

Sample x i Sample x (05)i Sample x (25)

i

Statistic pB px 2 pB p x 2 pB p x 2

TML 51187 017 001 49408 010 002 48883 004 002TSB 48696 020 002 50318 013 001 53674 003 000TB 56726 055 000 62095 024 000 66910 014 000

to the raw sample x i by minimizing DML The second alternative Sa 2 is the samplecovariance matrix of the raw sample With these two alternatives the NCPs in (9) for thethree statistics are given in Table 3 Notice that once a Sa is given the d in (9)corresponding to TML is xed and does not depend on the sample Because both TSB andTB involve fourth-order sample moments their corresponding ds change as the tails ofthe data set change QQplots for the ordered T s against quantiles of the x 2

24(d) indicatethat the distributions of TML and TSB applied to x i are well approximated by thecorresponding non-central chi-squares Because TML under H0 applied to x (25)

i is wellapproximated by x 2

24 we would expect its distributions under the two alternatives alsoto be well approximated by x 2

24(2260) and x 224(5119) respectively This expectation is

not fullled as judged from the corresponding QQplots The reason for this is that evenfor perfectly normal data simulated from Np( m S) the statistic TML will not behave like anon-central chi-square unless d is quite small and n is large Because the NCP tends to beoverestimated when it is large (Satorra Saris amp de Pijper 1991) the corresponding QQplots are below the x = y line

Can we trust the results of power analysis in the traditional approach when TML andTSB are well approximated by non-central chi-square distributions in this example Theanswer is no This is because the critical value x 2

24(95)plusmn 1 = 36415 which does not takethe actual variabilityof TMLor TSB into account is not a good estimate of ca in (3) As canbe seen from Table 2 there exist quite substantial differences between x 2

24(95)plusmn 1 andthe c05 s Actually power analysis based on a non-central chi-square table is not reliablefor even perfectly normal data when S(v) is not near S a Table 4 contrasts the powerestimates based on the bootstrap (ba ) with those based on the traditional approach (ba)

using NCPs in Table 3 The ba s are uniformly smaller than the ba s especially when theNCPs are not huge When an NCP is large enough there is almost no overlap betweenthe distribution of (T | H0) and that of (T | H1) Any sensible inference procedure can tellthe difference between the two and consequently the power estimates under Ha2 inTable 4 are approximately the same

102 Ke-Hai Yuan and Kentaro Hayashi

Table 2 Bootstrap critical values c 05

Statistic Sample x i Sample x (05)i Sample x (25)

i

TML 42348 39991 37022TSB 41694 41067 40587TB 57930 55120 53925

Table 3 Non-centrality parameter estimates associated with each statistic and sample

Sample x i Sample x (05)i Sample x (25)

i

Statistic Ha1 Ha2 Ha1 Ha2 Ha1 Ha2

TML 22603 51187 22603 51187 22603 51187TSB 21763 48695 23107 51901 24875 55948TB 20289 56726 20130 63187 20683 69516

Table 4 illustrates the power properties of each statistic when the data changeWhen the outlying cases are downweighted not only do the parameter estimatesbecome more efcient (Yuan et al 2001) but also the powers for identifying anincorrect model are higher for all the statistics The powers of TML and TSB arequite comparable while the power of TB is much lower This is in contrast to theconclusion for power analysis based on the non-central chi-square where TB hasthe highest power in detecting a wrong model (Fouladi 2000 Yuan amp Bentler1997)

Next we focus on testing model (17) for close t We will only present results of thebootstrap using TML The use of TSB or TB is essentially the same MacCallum et al(1996) recommended testing close t (RMSEA 05) fair t (05 lt RMSEA 08)mediocre t (08 RMSEA 10) and poor t (10 lt RMSEA) We will conduct abootstrap procedure for these tests With (10) it is easy to see thatRMSEA = [minv D(S c S(v))( p plusmn q)]12 = e corresponds to dc = n ( p plusmn q)e2 Thed c s corresponding to e = 05 08 and 10 are given in the second row of Table 5The 95th percentiles of the corresponding x 2

24(dc ) are in the third row of the tableThese are respectively xed critical values for testing close t fair t and mediocre tin the approach of MacCallum et al (1996)

Bootstrap approach to inference and power analysis 103

Table 4 Bootstrap power estimates (ba ) and power estimates (ba) referring to x 24(d)

Sample x i Sample x (05)i Sample x (25)

i

Statistic Ha1 Ha2 Ha1 Ha2 Ha1 Ha2

TML ba 627 993 661 997 715 998ba 803 998 803 998 803 998

TSB ba 613 989 661 997 717 998ba 782 997 814 998 851 999

TB ba 374 983 454 991 501 995ba 743 999 739 1000 754 1000

Table 5 Test for close t

RMSEA 05 08 10dc 87 22272 3480

Sample TML x 224(dc 95)plusmn 1 48919 66926 82764

x i 51187 ca 53013 69928 84287p x 2 033 315 695pB 073 359 695

x (05)i 49408 ca 49548 55595 59620

p x 2 046 368 743pB 053 107 163

x (25)i 48883 ca 46305 51741 55899

p x 2 050 384 756pB 034 080 131

Let S x be the sample covariance matrix of x i and v be the corresponding maximumlikelihood estimate of v The values of h in the solution for Sc in the form ofS

h= (1 plusmn h)S(v) + hS x corresponding to e = 05 08 and 10 are respectively

0421 61 0669 81 and 0830 83 Applying the bootstrap procedure outlined in Section24 to the samples x i x (05)

i and x (25)i the estimates ca and p-values for each of the

samples are reported in Table 5 The results indicate that there are substantialdifferences between the traditional p-values px 2 and the bootstrap p-values pBespecially when RMSEA = 10 for the downweighted samples It is obvious that TMLwhen RMSEA = 10 has a much shorter right tail than that of x 2

24(dc) This fact is alsoreected in the corresponding ca s Table 5 once again illustrates the fact that given aclose t the behaviour of TML cannot be described by a non-central chi-square unlessthe corresponding TML under H0 in (1) has a heavy right tail As when testing for exactt using a chi-square table to judge the signicance of a statistic in testing model (17) forclose t is misleading

Example 2 Neumann (1994) presented an alcohol and psychological symptomdata set consisting of 10 variables and 335 cases The two variables in x = (x 1 x 2) cent arerespectively family history of psychopathology and family history of alcoholism whichare indicators for a latent construct of family history The eight variables iny = (y1 y8) cent are respectively the age of rst problem with alcohol age of rstdetoxication from alcohol alcohol severity score alcohol use inventory SCL-90psychological inventory the sum of the Minnesota Multiphasic Personality Inventoryscores the lowest level of psychosocial functioning during the past year and the highestlevel of psychosocial functioning during the past year With two indicators for eachlatent construct these eight variables respectively measure age of onset alcoholsymptoms psychopathology symptoms and global functioning Neumannrsquos (1994)theoretical model for this data set is

x = Lx y + d y = L y h + e (18a)

h = B h + Gy + z (18b)

where

Lx =10

l 1

Aacute Ly =

1 l 2 0 0 0 0 0 0

0 0 1 l 3 0 0 0 0

0 0 0 0 1 l 4 0 0

0 0 0 0 0 0 1 l 5

0

BBBBB

1

CCCCCA

cent

(18c)

B =

0 0 0 0b 21 0 0 0

b 31 b 32 0 00 b42 b43 0

0

BBB

1

CCCA G =

g11

0

00

0

BBB

1

CCCA F = Var(y) (18d )

and e d and z are vectors of errors whose elements are all uncorrelated The modeldegrees of freedom are 29

With Mardiarsquos multivariate kurtosis equal to 1476 the data set may come from adistribution with heavy tails Abootstrap procedure is more appropriate after the heavytails are properly downweighted Actually Yuan et al (2001) found that the samplex (10)

i leads to the most efcient parameter estimates in (18) among various proceduresHere we apply the three statistics to the two samples x i and x (10)

i Our purpose is to

104 Ke-Hai Yuan and Kentaro Hayashi

explore the pivotal property of TML TSB and TB on each sample After noticing that alarge portion of the QQ plot for TML applying to x (10)

i is below the x = y line ouranalysis also includes the sample x (05)

i The QQplots of the three statistics applied to x i indicate that none of them is nearly

pivotal TML and TSB are for the most part above the x = y line however both their lefttails are slightlybelow the x = y line This implies that some bootstrap samples t model(18) extremely well a phenomenon which can be caused by too many data points nearthe centre of the distribution Adownweighting procedure only affects data points thatcause a test statistic to possess a heavy right tail It is not clear to us how to deal with asample when a test statistic has a light left tail The QQplots of TML and TSB on x (05)

i andx (10)

i suggest that their heavier right tails on x i are under control However the right tailof TB is still quite heavy when compared to x 2

29 The downweighting transformationH(10) not only just controls the right tail of TML but also makes most of the QQplot fallbelow the x = y line Visually inspecting the QQ plots of TML on x i x (05)

i and x (10)i

suggests that a downweighting transformation by H(r) with 0 lt r lt 05 may lead toa better procedure for analysing the alcohol and psychological symptom data setbased on TML Since any procedure is only an approximation to the real world andTML is nearly pivotal when applying to x (05)

i we recommend the analysis using TMLon x (05)

i for this data set This leads to TML = 4243 with a bootstrap p-value of 040implying that the model (18) marginally ts the alcohol and psychological symptomdata set

Both the raw data sets in Examples 1 and 2 have signicant multivariate kurtoses andthe downweighting transformation (15) achieves approximately pivotal behaviour ofthe statistics We maywonder how these test statistics behave when applied to a data setthat does not have a signicant multivariate kurtosis This is demonstrated in thefollowing example

Example 3 Table 121 of Mardia Kent and Bibby (1979) contains test scores ofn = 88 students on p = 5 topics mechanics vectors algebra analysis and statisticsThe rst two topics were tested with closed-book exams and the last three withopen-book exams Since these two examination methods may tap different abilities atwo-factor model as in (17a) with

L =10 l 21 0 0 00 0 10 l 42 l 52

sup3 acutecent F =

f11 f12

f 21 f 22

sup3 acute(19)

was proposed and conrmed by Tanaka Watadani and Moon (1991) Mardiarsquos multi-variate kurtosis for this data set equal to 0057 is not signicant It would be interestingto see how TML TB and TSB perform on this data set

QQ plots for the three statistics applied to x i indicate that all of them have heavierright tails than that of x 2

4 The QQ plot for TML applied to x (r)i continues to exhibit a

heavier right tail until r reaches 30 TSB and TB still possess quite heavy right tails evenwhen r = 30

This data set has been used as an example for inuence analysis Previous studiesindicate that case number 81 is the most inuential point (Lee amp Wang 1996 Fung ampKwan 1995) So it is enticing to apply the three statistics to the x i s without the 81stcase The QQplots of TML TSB and TB applied to the remaining 87 cases indicate that allthe three statistics still have heavier right tails than that of x 2

4 Actually compared tothose based on all 88 cases the right tails of the three statistics on the 87 cases are evenheavier For this data set our recommended analysis is to use TML applied to x (30)

i With

Bootstrap approach to inference and power analysis 105

TML = 189 and a bootstrap p-value of 71 model (19) is more than good enough inexplaining the relationship of the ve variables

6 Non-convergence with Bootstrap ReplicationsNon-convergence issues exist with resampling and caution is needed for bootstrapinference (Ichikawa amp Konishi 1995) This generally happens when the sample size isnot large enough and especially when a model structure is wrong This issue wasdiscussed in Yuan and Bentler (1997) with Monte Carlo studies on a covariancestructure model Here we propose a reasonable way of handling non-convergencewith bootstrap replications

With an iterative algorithm like Newtonrsquos convergence is generally dened ask Dv ( j ) k lt e where e is a small number and Dv ( j ) = v ( j ) plusmn v ( j plusmn 1) with v ( j ) being thej th step solution Let s = vech(S) be the factor formed by stacking the non-duplicatedelements of S and let its sample counterpart be s = vech(S) Then Dv ( j + 1) =

( Ccedils centj Wj Ccedilsj)plusmn 1 Ccedils centj Wj (s plusmn sj) where Ccedilsj = [ds(v)dv | v ( j )] sj = s(v ( j )) and Wj is the

corresponding weight matrix evaluated at v ( j ) So Dv ( j ) is proportional toS plusmn S(v ( j plusmn 1)) and it is impossible for k Dv ( j ) k to be smaller than e if S(v) is far fromE(S) Although a model is correct in a bootstrap population F0 some bootstrapreplications may still lie far away from the model especially when sampling errors arelarge for small samples For these samples even if we can obtain a solution using somenon-iterative or direct search method the corresponding statistic will be signicantBased on this fact we should distinguish two kinds of non-convergence in bootstrap orgeneral Monte Carlo studies The rst is where a sample contains enough distinct pointsand still cannot reach convergence with a model which should be treated as asignicant replication or a lsquobad modelrsquo The second is where a sample does not containenough distinct observations to t a model For obtaining a TML this number is p + 1 thenumber is p + 1 for obtaining a TB Although a TSB can be obtained once a TML isavailable we need to have a positive denite sample covariance matrix S y ofy i = vech[(x plusmn x )(x plusmn x ) cent ] in order for TSB to make sense (Bentler amp Yuan 1999)and the minimum number of distinct data points for S y to be positive denite is p + 1We should treat the second case as a bad sample and ignore it in bootstrap replications

In practice instead of estimating ca one generally reports the p-value of a statistic TIf all B bootstrap replications result in convergent solutions

pB =B0 + 1 plusmn M

B0 + 1

where M = fTb | Tb gt Tg When non-convergences occur B0 should be dened as thenumber of converged samples plus the number of signicant samples due to a badmodel M should be dened as the number of non-converged samples due to the badmodel plus the number of converged samples that result in Tb gt T With this modi-cation there is no problem with hypothesis testing A similar modication applies toformula (8) for power evaluation With ca = T(B0 (1 plusmn a)) in determining the numeratorin (8) one needs at least B0(1 plusmn a) converged samples when sampling from F0 The proposed procedure fails to generate a power estimate when the number ofconvergences under H0 is below B0(1 plusmn a)

Let l 1(S x ) $ l 2(S x ) $ $ l p(S x ) be the eigenvalues of the sample covariancematrix S x and g1(S y ) $ g 2(S y ) $ $ gp (S y ) be the eigenvalues of the sample

106 Ke-Hai Yuan and Kentaro Hayashi

covariance matrix S y of y i = vech[(x plusmn x )(x plusmn x ) cent ] In computing the three examplesour criteria for bad samples are l p(S x )l 1(S x ) 10plusmn 20 for obtaining TML andgp (S y )g1(S y ) 10plusmn 20 for obtaining TSB and TB If these criteria are satised and themodel still cannot reach convergence (k Dv k lt 10plusmn 5) in 100 iterations we treat thecorresponding statistic as innity All non-convergences with the three examples weredue to signicant replications these are reported in Table 6 Table 6(a) implies that for agiven sample the further a model is from H0 the more often non-convergences occurThe table also suggests that with a given model the heavier the tails of a data set themore often one obtains non-convergences

7 Discussion and conclusionsExisting conclusions regarding the three commonly used statistics TML TSB and TB arebased on asymptotics and Monte Carlo studies The properties inherent in either ofthese approaches may not be enjoyed by these statistics in practice because of either anite sample or an unknown sampling distribution By applying them to a specic data

Bootstrap approach to inference and power analysis 107

Table 6 Non-convergences due to signicant samples

(a) Example 1

Sample x i Sample x (05)i Sample x (25)

i

Statistic H0 Ha1 Ha2 H0 Ha1 Ha2 H0 Ha1 Ha2

TML 0 0 2 0 0 1 0 0 1TSB 0 0 2 0 0 1 0 0 1TB 0 10 21 0 10 27 0 7 20

(b) Example 2

Sample x i Sample x (05)i Sample x (10)

iStatistic H0 H0 H0

TML 9 1 1TSB 9 1 1TB 1 0 0

(c) Example 3

Sample x i Sample x (30)i Sample x i

(81st case removed)Statistic H0 H0 H0

TML 1 0 0TSB 1 0 0TB 0 0 0

set through resampling properties of the three statistics can be visually examined bymeans of QQplots When a data set possesses heavy tails TML will inherit these tails byhaving a heavy right tail Actually all three statistics need the sampling distribution tohave nite fourth-order moments With possible violation of this assumption bypractical data we propose applying a bootstrap procedure to a transformed samplex (r)

i through downweighting The combination of bootstrapping and downweightingnot only offers a theoretical justication for applying the bootstrap to a data set withheavy tails but also provides a quite exible tool for exploring the properties of each ofthe three statistics Even if inference is based on referring a statistic to a chi-squaredistribution one will obtain a more accurate model evaluation by applying TML to atransformed sample x (r)

i Asymptotics justify the three statistics from different perspectives and TB is

asymptotically distribution-free Previous Monte Carlo studies mainly support TSBHowever with proper downweighting TML is generally the one that is best describedby a chi-square distribution However the conclusion that TML is the best statistic forbootstrap inference has to be preliminary Future studies may nd TB or TSB moreappropriate for other data sets We recommend exploring the different procedures for agiven data set as illustrated in Section 5

AcknowledgementsThe authors would like to thank Professor Peter M Bentler and two anonymous referees whoseconstructive comments lead to an improved version of this paper Correspondence concerningthis article should be addressed to Ke-Hai Yuan (kyuanndedu)

ReferencesAmemiya Y amp Anderson T W (1990) Asymptotic chi-square tests for a large class of factor

analysis models Annals of Statistics 18 1453ndash1463Barndorff-Nielsen O E amp Cox D R (1984) Bartlett adjustments to the likelihood ratio statistic

and the distribution of the maximum likelihood estimator Journal of the Royal StatisticalSociety B 46 483ndash495

Bentler P M (1995) EQS Structural equations program manual Encino CA MultivariateSoftware

Bentler P M amp Yuan K-H (1999) Structural equation modeling with small samples Teststatistics Multivariate Behavioral Research 34 181ndash197

Beran R (1986) Simulated power functions Annals of Statistics 14 151ndash173Beran R (1988) Prepivoting test statistics Abootstrap view of asymptotic renements Journal

of the American Statistical Association 83 687ndash697Beran R amp Srivastava M S (1985) Bootstrap tests and condence regions for functions of a

covariance matrix Annals of Statistics 13 95ndash115Bollen K A amp Stine R (1993) Bootstrapping goodness of t measures in structural equation

models In K A BOllen and J S Long (Eds) Testing structural equation models(pp 111ndash135) Newbury Park CA Sage

Browne M W (1984) Asymptotic distribution-free methods for the analysis of covariancestructures British Journal of Mathematical and Statistical Psychology 37 62ndash83

Browne M W amp Shapiro A (1988) Robustness of normal theory methods in the analysis oflinear latent variate models British Journal of Mathematical and Statistical Psychology 41193ndash208

108 Ke-Hai Yuan and Kentaro Hayashi

Cohen J (1988) Statistical power analysis for the behavioral sciences (2nd ed) Hillsdale NJErlbaum

Davison A C amp Hinkley D V (1997) Bootstrap methods and their application CambridgeCambridge University Press

Efron B ampTibshirani R J (1993) An introduction to the bootstrap New York Chapman ampHallFouladi R T (1998) Covariance structure analysis techniques under conditions of multi-

variate normality and nonnormality Modied and bootstrap based test statistics Paperpresented at the annual meeting of the American Educational Research Association San DiegoCA

Fouladi R T (2000) Performance of modied test statistics in covariance and correlationstructure analysis under conditions of multivariate nonnormality Structural EquationModeling 7 356ndash410

Fung W K ampKwan C W (1995) Sensitivityanalysis in factor analysis Difference between usingcovariance and correlation matrices Psychometrika 60 607ndash614

Hall P (1992) The bootstrap and Edgeworth expansion New York Springer-VerlagHolzinger K J amp Swineford F (1939) A study in factor analysis The stability of a bi-factor

solution University of Chicago Press Supplementary Educational Monographs No 48Hu L T Bentler P M amp Kano Y (1992) Can test statistics in covariance structure analysis be

trusted Psychological Bulletin 112 351ndash362Huber P J (1981) Robust statistics New York WileyIchikawa M amp Konishi S (1995) Application of the bootstrap methods in factor analysis

Psychometrika 60 77ndash93Lawley D N amp Maxwell A E (1971) Factor analysis as a statistical method (2nd ed) New

York American ElsevierLee S Y ampWang S J (1996) Sensitivityanalysis of structural equation models Psychometrika

61 93ndash108MacCallum R C Browne M W ampSugawara H M (1996) Power analysis and determination of

sample size for covariance structure modeling Psychological Methods 1 130ndash149Mardia K V (1970) Measure of multivariate skewness and kurtosis with applications

Biometrika 57 519ndash530Mardia K V Kent J T amp Bibby J M (1979) Multivariate analysis New York Academic PressMaronna R A (1976) Robust M-estimators of multivariate location and scatter Annals of

Statistics 4 51ndash67Mooijaart A ampBentler P M (1991) Robustness of normal theory statistics in structural equation

models Statistica Neerlandica 45 159ndash171Neumann C S (1994) Structural equation modeling of symptoms of alcoholism and

psychopathology Doctoral dissertation University of Kansas LawrenceSatorra A amp Bentler P M (1988) Scaling corrections for chi-square statistics in covariance

structure analysis In 1988 Proceedings of Business and Economics Sections (pp 308ndash313)Alexandria VA American Statistical Association

Satorra A amp Bentler P M (1990) Model conditions for asymptotic robustness in the analysis oflinear relations Computational Statistics amp Data Analysis 10 235ndash249

Satorra A ampSaris W E (1985) Power of the likelihood ratio test in covariance structure analysisPsychometrika 50 83ndash90

Satorra A Saris W E amp de Pijper W M (1991) Acomparison of several approximations to thepower function of the likelihood ratio test in covariance structure analysis StatisticaNeerlandica 45 173ndash185

Shapiro A (1987) Robustness properties of the MDF analysis of moment structures SouthAfrican Statistical Journal 21 39ndash62

Steiger J H amp Lind J M (1980) Statistically based tests for the number of common factorsPaper presented at the annual meeting of the Psychometric Society Iowa City

Steiger J H Shapiro A amp Browne M W (1985) On the multivariate asymptotic distribution ofsequential chi-square statistics Psychometrika 50 253ndash264

Bootstrap approach to inference and power analysis 109

Tanaka Y Watadani S amp Moon S H (1991) Inuence in covariance structure analysis with anapplication to conrmatory factor analysis Communications in Statistics Theory andMethods 20 3805ndash3821

Tyler D E (1983) Robustness and efciency properties of scatter matrices Biometrika 70411ndash420

Wakaki H Eguchi S amp Fujikoshi Y (1990) A class of tests for a general covariance structureJournal of Multivariate Analysis 32 313ndash325

Yuan K-H amp Bentler P M (1997) Mean and covariance structure analysis Theoretical andpractical improvements Journal of the American Statistical Association 92 767ndash774

Yuan K-H amp Bentler P M (1998) Structural equation modeling with robust covariancesIn A E Raftery (ed) Sociological methodology 1998 (pp 363ndash396) Boston BlackwellPublishers

Yuan K-H amp Bentler P M (1999) On normal theory and associated test statistics in covariancestructure analysis under two classes of nonnormal distributions Statistica Sinica 9 831ndash853

Yuan K-H Chan W amp Bentler P M (2000) Robust transformation with applications tostructural equation modelling British Journal of Mathematical and Statistical Psychology53 31ndash50

Yuan K-H Bentler P M amp Chan W (2001) Structural equation modeling with heavy taileddistributions Manuscript submitted for publication

Yuan K-H amp Hayashi K (2001) On using an empirical Bayes covariance matrix in bootstrapapproach to covariance structure analysis Manuscript submitted for publication

Yung Y F amp Bentler P M (1994) Bootstrap-corrected ADF test statistics in covariance structureanalysis British Journal of Mathematical and Statistical Psychology 47 63ndash84

Yung Y F amp Bentler P M (1996) Bootstrapping techniques in analysis of mean and covariancestructures In G A Marcoulides and R E Schumacker (Eds) Advanced structural equationmodeling Techniques and issues (pp 195ndash226) Hillsdale NJ Lawrence Erlbaum

Zhang J amp Boos D D (1992) Bootstrap critical values for testing homogeneity of covariancematrices Journal of the American Statistical Association 87 425ndash429

Received 9 November 2000 revised version received 22 March 2002

110 Ke-Hai Yuan and Kentaro Hayashi

Page 3: Bootstrap approach to inference and power analysis based on three test statistics for covariance structure models

procedure is to nd whether there exists a v0 such that the null hypothesis

H0 S = S(v0) (1)

holds Agood testing procedure should accept H0 when it is true When an alternativehypothesis S = S a is true such that

H1 S a THORN S(v) for any admissible v (2)

it should reject (1)In order to test (1) with a statistic T = T(x 1 x n) we need a critical value ca

such that

P(T gt ca | H0) = a (3)

Without knowing Fx it is impossible for us to nd ca when the probability in (3) ismeasured by Fx However we can dene a ca such that

P(T gt ca | F0) = a (4)

where F0 is an edf with a covariance matrix satisfying (1) In bootstrap testing for (1) ca

is estimated through resampling from F0 Let

y i = S12 Splusmn 12x x i i = 1 n (5)

where S = S(v) for an admissible estimate v (Beran amp Srivastava 1985) It is obvious thatthe edfs Fx and Fy have the same type of distribution but differ in lsquolocation and scalersquoActually the only purpose of (5) is to create an empirical distribution F0 = Fy havingthe same distributional type as that of Fx and with a covariance matrix satisfying H0

Let b = (y (b)1 y (b)

n ) be a random sample of size n from F0 and Tb be thecorresponding statistic of T evaluated at this sample With independent samples b b = 1 B0 we rearrange the Tb in order and denote them by

T(1) T(2) T(B0 )

The bootstrap estimate ca of ca is the B0(1 plusmn a) th quantile T(B0 (1 plusmn a)) and we reject thenull hypothesis (1) when T gt ca Let a be the exact level of a test dened in (3) and a

be the corresponding level when ca is replaced by ca Under quite general conditionsBeran (1986) established the consistency of a for a These conditions include thefourth-order moments of Fx being nite

22 Power evaluationThe bootstrap can also be used to evaluate the power ba of a statistic T dened by

ba = P(T gt ca | H1) (6)

Let Sa be the covariance matrix in (2) and

z i = S 12a Splusmn 12

x x i (7)

Then Fa = Fz represents the bootstrap population under H1 For b = 1 2 B1 letZb = (z (b)

1 z (b)n ) be random samples of size n from Fa and Tb be the corresponding

statistic of T evaluated at Zb Then the bootstrap estimate of ba in (6) is

ba = fTb gt cag B1 (8)

where ca is the bootstrap estimate of the ca in (4) Under fairlygeneral conditions Beran(1986) established the consistencyof ba for ba as n approaches innity This consistency

Bootstrap approach to inference and power analysis 95

should be understood in the following sense when S a is the population covariancematrix of Fx that generates the observed sample x 1 x n then ba in (7) is consistentfor the probability ba in (6) and the latter is the exact power measured by the cdf Fx

In contrast to (5) the S a in (7) cannot be replaced by a consistent estimate Forexample we will not obtain a consistent estimate of power for T to reject the nullhypothesis H0 under the true covariance matrix S a = Cov(x i) when replacing S a by thesample covariance matrix S x of the x i s The inconsistency here can be well understoodthrough TML Actually with S a being O(1

n

p) apart from S0 the bootstrap approach to

power is asymptotically equivalent to the approach developed in Satorra and Saris(1985) for normally distributed data (see also Steiger Shapiro amp Browne 1985) Theseauthors used a non-central chi-square distribution to describe the behaviour of TMLunder H1 with a non-centrality parameter (NCP)

d = n minv

D(S a S(v)) (9)

where D = DML is the Wishart likelihood discrepancy function When replacing S a by aconsistent estimate S a even though S a = Sa + Op(1

n

p) the NCP estimate based on

Sa can be Op(1) apart from that based on S a This limitation of power evaluation alsoexists in other classical procedures for power analysis (eg Cohen 1988) The fact isthat without knowledge of the true effect size one cannot consistently estimate thepower of a statistic in rejecting the null

Although the bootstrap approach to power analysis is equivalent to the approachproposed by Satorra and Saris (1985) for normally distributed data when S a is O(1

n

p)

apart from S(v) there are fundamental differences between the two approaches Withthe bootstrap approach there is neither a central chi-square distribution when H0 holdsnor a non-central chi-square distribution when H1 holds Instead the distribution of T isjudged by its empirical behaviour based on resampling from observed data We mayregard l = minv D(Sa S(v)) as a measure of effect size For a given sample size thegreater l is the higher the power for a bootstrap test to distinguish between H0 and H1Because there is no chi-square distribution assumption with the bootstrap approachpower analysis cannot resort to a chi-square table and has to be evaluated separately fordifferent samples This tailor-made type of analysis can be regarded as an advantage ofthe bootstrap methodology

23 Determination of sample sizeLet x 1 x n be a pilot sample Under H1 in (2) we want to nd the smallest samplesize for a statistic T to reject H0 with probability b0 For this purpose we rst draw B0independent samples b (m) = (y (b)

1 y (b)m ) of size m from F0 to estimate the critical

value ca(m) as in Section 21 We then draw B1 independent samples Zb (m) Fz of sizem where the covariance matrix of z Fz is S a Evaluating Tb (m) at each sampleZb (m) as in (8) the estimated power for sample size m is

ba(m) = fTb (m) gt ca (m)g B1

Asmaller sample size m1 is needed if ba(m) gt b0 otherwise a greater sample size m 2 isneeded Finding the minimum m such that ba (m) $ b0 may require a series of trialsThe interval-halving procedure in the Appendix of MacCallum Browne and Sugawara(1996) who studied sample-size determination using non-central chi-squares can beequally applied here More discussion on the bootstrap approach to sample size and

96 Ke-Hai Yuan and Kentaro Hayashi

power can be found in Beran (1986) Efron and Tibshirani (1993 Chapter 25) andDavison and Hinkley (1997 Section 46)

24 Test for close tA covariance structure model is at best only an approximation to the real world Anyinteresting model will be rejected when the sample size is large enough Based on theroot-mean-square error of approximation (RMSEA) (Steiger amp Lind 1980) MacCallumet al (1996) proposed testing for close t rather than exact t in (1) Using a non-centralchi-square to describe the behaviour of TML their approach is equivalent to testing if theNCP is less than a prespecied value The test for close t can also be easilyimplemented with the bootstrap Let S c be a covariance matrix with close t such that

dc = n minv

D(Sc S(v)) (10)

where dc is the lsquonon-centrality parameterrsquo in equation (8) of MacCallum et al (1996)Such a covariance matrix can be found in the form of Sh = hS x + (1 plusmn h)S(v) WhenD = DML is the Wishart likelihood discrepancy function Yuan and Hayashi(2001) showed that minv D(Sh S(v)) = D(Sh S(v)) is a strictly increasing function ofh [ [0 1] and it is straightforward to nd a S c once a dc is given Let y i = S 12

c Splusmn 12x x i

i = 1 n then the covariance matrix of y Fy is S c Let Tb b = 1 B0 be thecorresponding statistics of T evaluated respectivelyat B0 independent samples from Fy Then ca = T(B0 (1 plusmn a)) is the critical value estimate of the test for close t and the modelwill be rejected when T = T(x 1 x n) gt ca It is also straightforward to performpower analysis with the bootstrap when a less-close-t covariance matrix describes thepopulation

3 Pivotal property of TML TSB and TB

Although it is unclear which of the statistics TML TSB and TB is best for bootstrappingwith a specic data set there is a general result for choosing a statistic (Beran 1988Hall 1992) In the following we will discuss this result and its relevance to CSA

A statistic is pivotal if its distribution does not depend on any parameters in theunderlying sampling distribution and it is asymptotically pivotal if its asymptoticdistribution does not depend on unknown parameters Let a statistic T = f (S x ) be asmooth function of the sample covariance matrix S x of x 1 x n Suppose T isasymptotically pivotal and an Edgeworth expansion applies to its distribution functionunder H0 (Barndorff-Nielsen amp Cox 1984 Wakaki Eguchi amp Fujikoshi 1990)

P(T t | F0) = G(t) + n plusmn 1g(t) + O(nplusmn 32) (11a)

where G(t) is the asymptotic distribution function of T and g(t ) is a smooth functionthat depends on some unknown population parameters As discussed in Hall (1992) andDavision and Hinkley (1997 Section 261) (11a) generally holds for smooth functionsof sample moments Let T be the corresponding statistic based on resampling from thecorresponding F0 Its Edgeworth expansion is

P(T lt t | F0) = G(t) + n plusmn 1 g(t ) + Op(nplusmn 32) (11b)

Since g(t) = g(t ) + Op(nplusmn 12) in general

P(T t | F0) plusmn P(T t | F0) = Op(nplusmn 32) (12)

Bootstrap approach to inference and power analysis 97

Suppose we know G then asymptotic inference is based on comparing the observedstatistic T = T(x 1 x n) to the critical value Gplusmn 1(1 plusmn a) Equation (11a) implies thatasymptotic inference can achieve a designated level a to the order of accuracy ofOp(1n) According to (12) bootstrap inference based on an asymptotically pivotalstatistic can achieve the level a to a higher order of accuracy When T is notasymptotically pivotal G(t) in (11) depends on some unknown parameters One hasto estimate G(t) by G(t) for asymptotic inference Similarly with the bootstrap one hasto implicitlyestimate the unknown parameters in G(t ) and consequently it achieves thesame order of accuracy to the level a as that based on G(t) So for more accurateinference we should choose a statistic that is as nearly pivotal as possible Beran (1988)gave a nice discussion on using a pivotal statistic for bootstrap and the order of accuracyfor Type I errors

When data are normallydistributed (TML| H0) is asymptoticallydistributed as x 2p plusmn q

where p = p( p + 1)2 and q is the number of unknown parameters in S(v) Conse-quently TML is asymptotically pivotal (Davison amp Hinkley 1997 p 139) More generalconditions also exist for TML to be asymptotically pivotal with some specic models(Amemiya amp Anderson 1990 Browne amp Shapiro 1988 Mooijaart amp Bentler 1991Satorra ampBentler 1990 Shapiro 1987 Yuan ampBentler 1999) Unfortunately there is noeffective way of verifying these conditions in practice The statistic TSB is obtained byrescaling TML using fourth-order sample moments (Satorra amp Bentler 1988) Within theclass of elliptical or pseudo-elliptical distributions with nite fourth-order moments(TSB | H0) approaches x 2

p plusmn q (Yuan amp Bentler 1999) So TSB is asymptotically pivotalwhen the sampling distribution is elliptical or pseudo-elliptical It is not asymptoti-cally pivotal for other types of non-normal distribution In contrast to TML and TSB TBis asymptotically pivotal for any sampling distribution with nite fourth-ordermoments

The above discussion may imply that TB is the preferred statistic for bootstrapinference in CSA However there exist conditions that may hide the pivotal property ofTB with nite samples In addition to the asymptotically pivotal property Davison andHinkley (1997 Section 251) discussed nearly pivotal properties of a statistic T whichrequires the distribution of T to approximately follow the same distribution whensampling from Fx or Fx Since the target function G(t) for the three test statistics is thecdf of x 2

p plusmn q the better a statistic is approximated by a chi-square distribution the moreaccurate its bootstrap inference is in controlling Type I errors Because sample size is aserious issue with CSA a statistic that is asymptoticallypivotal may not be nearlypivotalas will be shown in Section 5

4 Heavy tails with practical dataBootstrap inference based on TML TB or TSB requires Fx to have nite fourth-ordermoments Of course the sample fourth-order moments of any sample will always benite However if some of the fourth-order moments (eg kurtoses) are quite large thecorresponding population fourth-order moments may actuallybe innite Even when allthe population fourth-order moments are nite inference based on the bootstrap willnot be accurate when Fx possesses heavy tails Here we propose applying the bootstrapprocedure to a sample in which cases contributing to heavy tails are properly down-weighted Following the study in Yuan Chan and Bentler (2000) we only use theHuber-type weights (Huber 1981 Tyler 1983) in controlling those cases

98 Ke-Hai Yuan and Kentaro Hayashi

For a p-variate sample x 1 x n let

d i = d (x i m S) = [(x i plusmn m ) cent Splusmn 1(x i plusmn m )]12

be the Mahalanobis distance and u 1(t ) and u 2(t) be non-negative scalar functions Arobust M-estimator ( ˆm S) can be obtained by solving (Maronna 1976)

m =Xn

i = 1

u 1(di )x i

iquest Xn

i = 1

u1(di ) (13a)

S =Xn

i = 1

u 2(d 2i )(x i plusmn m )(x i plusmn m ) cent n (13b)

With decreasing functions u 1(t) and u 2(t) cases having greater di s receive smallerweights and thus their effects are controlled Let r represent the proportion ofoutlying cases one wants to control and r be a constant dened by P(x 2

p gt r 2) = rThe Huber-type weights corresponding to this r are

u 1(d ) =1 if d r

rd if d gt r

raquo(14)

and u 2(d 2) = fu 1(d )g 2J where J is a constant such that Efx 2p u 2(x 2

p)g = p whichmakes the estimate S unbiased if Fx = Np( m S) We may regard r as a tuning parameterwhich can be adjusted according to the tails of a specic data set Applying differenttypes of weights to several data sets Yuan Bentler and Chan (2001) found that the mostefcient parameter estimates often go with Huber-type weights

Let ( ˆm S) be the converged solution to (13) u2 i = u 2fd 2(x i ˆm S)g and

x (r)i = f

u 2 i

p(x i plusmn ˆm )g (15)

Then we can rewrite (13b) as

S =Xn

i = 1

x (r)i x (r)

i cent n

which is just the sample covariance matrix of the x (r)i Yuan et al (2000) proposed using

(15) as a downweighting transformation formula Working with several practical datasets they found that the transformed sample x (r)

i is much better approximated by anormal distribution than the original sample x i As we shall see when applying thebootstrap to a sample x (r)

i the test statistic TML is approximately pivotal For x (r)i we

have

E(x (r)i 1 x (r)

i 2 x (r)i 3 x (r)

i 4 ) = E[u 22(d 2

i )(x i 1 plusmn m1)(x i 2 plusmn m2)(x i 3 plusmn m 3)(x i 4 plusmn m4)] + o(1) (16)

where x (r)i j is the j th element of x (r)

i Because the denominator of u 2(d 2i ) in (14)

contains d 2i when d 2

i gt r 2 (16) is bounded Even when the fourth-order moments of Fx

do not exist the corresponding fourth-order moments of the x (r)i are still nite

When interest lies in the structure of the population covariance matrix S = Cov(x i)one needs to know whether transformation (15) changes the structure of S Let S (r) bethe population covariance matrix of x (r)

i then S (r) = kS when the sampling distribu-tion is elliptically symmetric For almost all the commonly used models in the socialsciences modelling S is equivalent to modelling S (r) (Browne 1984) Yuan et al (2000)further illustrated that outlying cases create signicant multivariate skewness and thatthe sample covariance matrix S x is biased in such a situation Analysis based on x (r)

i can

Bootstrap approach to inference and power analysis 99

successfully remove the bias in S x and recover the correct model structure More detailson the merits of robust approaches to CSAcan be found in Yuan and Bentler (1998) andYuan et al (2000)

5 ApplicationsOur purpose in this section is to compare the performance of TML TB and TSB using realdata We will show that by combining the bootstrap and a downweighting transforma-tion a nearly optimal procedure may be found for analysing a given data set If a teststatistic T is nearly pivotal then its evaluations Tb at the B0 bootstrap replicationsshould approximately follow a chi-square distribution There are a variety of tools toevaluate this property We favour the quantilendashquantile (QQ) plot because of itsvisualization value If T follows x 2

d the plot of ordered Tb s against the quantiles ofx 2

d should form an approximately straight line with slope2

p2 Asignicant departure

from this line indicates violations of the pivotal property To save space the QQplots arepresented in a document available on the Internet (httpwwwndedu kyuanpapersbsinfpowgpdf ) We choose B0 = B1 = B = 1000 in estimating ca and ba although a smaller B may be enough in obtaining these estimates (Efron amp Tibshirani1993)

Example 1 Holzinger and Swineford (1939) provide a data set consisting of 24cognitive variables on 145 subjects We use nine of the variables in our study visualperception cubes and lozenges measuring a spatial ability factor paragraph compre-hension sentence completion and word meaning measuring a verbal ability factoraddition counting dots and straight-curved capitals measuring (via the imposition of atime limit) a speed factor Let x represent the nine variables then the conrmatoryfactor model

x = L f + e Cov(x ) = LFL cent + W (17a)

with

L =

10 l 21 l 31 0 0 0 0 0 0

0 0 0 10 l 52 l 62 0 0 0

0 0 0 0 0 0 10 l 83 l 93

0

BB

1

CCA

cent

F =

f11 f12 f13

f 21 f22 f 23

f 31 f32 f 33

0

BB

1

CCA

(17b)

represents Holzinger and Swinefordrsquos hypothesis We assume that the measurementerrors are uncorrelated with W = Cov(e ) being a diagonal matrix There are q = 21unknown parameters and the model degrees of freedom are 24

Mardiarsquos (1970) multivariate kurtosis for the nine-variable model is 304 implyingthat the data may come from a distribution with heavier tails than those of a normaldistribution We therefore apply the downweighting transformation (15) in order forTML to be approximately pivotal Based on previous research (Yuan et al 20002001) we apply the three statistics to three samples the raw sample x i and thetransformed samples x (05)

i and x (25)i The following bootstrap procedures include

further transforming these samples to satisfy H0 or H1 as given in Section 2

100 Ke-Hai Yuan and Kentaro Hayashi

QQ plots for the three statistics on the raw data set suggest that no statistic isapproximately pivotal All have heavier right tails than that of x 2

24 This is expected forTML because the sample x i exhibits heavier tails than those of a normal distributionPrevious studies suggest that a large sample size is needed for TB to behave like achi-square random variable Asample size of n = 145 may not be large enough or theremay be other unknown factors that prevent TB from behaving like a chi-square variableEven though many simulation studies recommend TSB this statistic certainly has aheavier right tail than that of x 2

24 for this data setWhen applied to the transformed sample x (05)

i all the three statistics still haveheavier right tails than that of x 2

24 When applied to x (25)i the QQplots indicate that TML

is approximately described by x 224 but TB or TSB are not Consequently the suggested

procedure for this data set is to apply TML to the transformed sample x (25)i Boot-

strapping with TML on this sample not only leads to more efcient parameter estimates(Yuan et al 2001) but also provides a more accurate Type I error estimate

Table 1 gives the signicance levels of model (17) evaluated by various proceduresSeveral differences exist among these procedures First all the bootstrap p-values ( pB)

are greater than those ( px 2 ) obtained by referring the test statistics to x 224 The only

comparable pair of pB and px 2 is when TML is applied to the sample x (25)i Second the TB

statistic for each sample is the largest among the three statistics but the pB for TB is alsothe largest for any of the three samples This is in contrast to conclusions regarding theperformance of TB when referring to a chi-square distribution (Hu et al 1992 Fouladi2000 Yuan amp Bentler 1997) Third all the pBs based on any statistic with any of thesamples are quite comparable implying the robustness of bootstrap inference Theabove phenomena can be further explained by the critical values c 05 in Table 2 All thec 05 s are greater than x 2

24(95)plusmn 1 = 36415 the 95th percentile of x 224 When the heavier

tails are downweighted in the sample the T s corresponding to each of the threestatistics behave more like x 2

24 and the corresponding c 05 is nearer 36415 Although TBis the largest among the three statistics on anyof the samples the corresponding c 05 forTB is also the largest explaining whythe associated pBs for TB can also be the largest Onthe other hand the traditional inference procedure is to refer TB to x 2

24 for signicanceWith such a xed reference distribution that does not take the increased variabilityof TBinto account a larger TB generally corresponds to a smaller px 2

The power properties of TML TB and TSB are evaluated at two alternatives Startingwith model (17) the parameter l 91 is the only signicant path identied by theLagrange multiplier test in EQS (Bentler 1995) Adding this extra parameter to (17b)the rst alternative Sa1 is the estimated covariance matrix obtained by tting this model

Bootstrap approach to inference and power analysis 101

Table 1 Statistics bootstrap p-values ( pB) and p-values ( px 2 ) referring to x 224

Sample x i Sample x (05)i Sample x (25)

i

Statistic pB px 2 pB p x 2 pB p x 2

TML 51187 017 001 49408 010 002 48883 004 002TSB 48696 020 002 50318 013 001 53674 003 000TB 56726 055 000 62095 024 000 66910 014 000

to the raw sample x i by minimizing DML The second alternative Sa 2 is the samplecovariance matrix of the raw sample With these two alternatives the NCPs in (9) for thethree statistics are given in Table 3 Notice that once a Sa is given the d in (9)corresponding to TML is xed and does not depend on the sample Because both TSB andTB involve fourth-order sample moments their corresponding ds change as the tails ofthe data set change QQplots for the ordered T s against quantiles of the x 2

24(d) indicatethat the distributions of TML and TSB applied to x i are well approximated by thecorresponding non-central chi-squares Because TML under H0 applied to x (25)

i is wellapproximated by x 2

24 we would expect its distributions under the two alternatives alsoto be well approximated by x 2

24(2260) and x 224(5119) respectively This expectation is

not fullled as judged from the corresponding QQplots The reason for this is that evenfor perfectly normal data simulated from Np( m S) the statistic TML will not behave like anon-central chi-square unless d is quite small and n is large Because the NCP tends to beoverestimated when it is large (Satorra Saris amp de Pijper 1991) the corresponding QQplots are below the x = y line

Can we trust the results of power analysis in the traditional approach when TML andTSB are well approximated by non-central chi-square distributions in this example Theanswer is no This is because the critical value x 2

24(95)plusmn 1 = 36415 which does not takethe actual variabilityof TMLor TSB into account is not a good estimate of ca in (3) As canbe seen from Table 2 there exist quite substantial differences between x 2

24(95)plusmn 1 andthe c05 s Actually power analysis based on a non-central chi-square table is not reliablefor even perfectly normal data when S(v) is not near S a Table 4 contrasts the powerestimates based on the bootstrap (ba ) with those based on the traditional approach (ba)

using NCPs in Table 3 The ba s are uniformly smaller than the ba s especially when theNCPs are not huge When an NCP is large enough there is almost no overlap betweenthe distribution of (T | H0) and that of (T | H1) Any sensible inference procedure can tellthe difference between the two and consequently the power estimates under Ha2 inTable 4 are approximately the same

102 Ke-Hai Yuan and Kentaro Hayashi

Table 2 Bootstrap critical values c 05

Statistic Sample x i Sample x (05)i Sample x (25)

i

TML 42348 39991 37022TSB 41694 41067 40587TB 57930 55120 53925

Table 3 Non-centrality parameter estimates associated with each statistic and sample

Sample x i Sample x (05)i Sample x (25)

i

Statistic Ha1 Ha2 Ha1 Ha2 Ha1 Ha2

TML 22603 51187 22603 51187 22603 51187TSB 21763 48695 23107 51901 24875 55948TB 20289 56726 20130 63187 20683 69516

Table 4 illustrates the power properties of each statistic when the data changeWhen the outlying cases are downweighted not only do the parameter estimatesbecome more efcient (Yuan et al 2001) but also the powers for identifying anincorrect model are higher for all the statistics The powers of TML and TSB arequite comparable while the power of TB is much lower This is in contrast to theconclusion for power analysis based on the non-central chi-square where TB hasthe highest power in detecting a wrong model (Fouladi 2000 Yuan amp Bentler1997)

Next we focus on testing model (17) for close t We will only present results of thebootstrap using TML The use of TSB or TB is essentially the same MacCallum et al(1996) recommended testing close t (RMSEA 05) fair t (05 lt RMSEA 08)mediocre t (08 RMSEA 10) and poor t (10 lt RMSEA) We will conduct abootstrap procedure for these tests With (10) it is easy to see thatRMSEA = [minv D(S c S(v))( p plusmn q)]12 = e corresponds to dc = n ( p plusmn q)e2 Thed c s corresponding to e = 05 08 and 10 are given in the second row of Table 5The 95th percentiles of the corresponding x 2

24(dc ) are in the third row of the tableThese are respectively xed critical values for testing close t fair t and mediocre tin the approach of MacCallum et al (1996)

Bootstrap approach to inference and power analysis 103

Table 4 Bootstrap power estimates (ba ) and power estimates (ba) referring to x 24(d)

Sample x i Sample x (05)i Sample x (25)

i

Statistic Ha1 Ha2 Ha1 Ha2 Ha1 Ha2

TML ba 627 993 661 997 715 998ba 803 998 803 998 803 998

TSB ba 613 989 661 997 717 998ba 782 997 814 998 851 999

TB ba 374 983 454 991 501 995ba 743 999 739 1000 754 1000

Table 5 Test for close t

RMSEA 05 08 10dc 87 22272 3480

Sample TML x 224(dc 95)plusmn 1 48919 66926 82764

x i 51187 ca 53013 69928 84287p x 2 033 315 695pB 073 359 695

x (05)i 49408 ca 49548 55595 59620

p x 2 046 368 743pB 053 107 163

x (25)i 48883 ca 46305 51741 55899

p x 2 050 384 756pB 034 080 131

Let S x be the sample covariance matrix of x i and v be the corresponding maximumlikelihood estimate of v The values of h in the solution for Sc in the form ofS

h= (1 plusmn h)S(v) + hS x corresponding to e = 05 08 and 10 are respectively

0421 61 0669 81 and 0830 83 Applying the bootstrap procedure outlined in Section24 to the samples x i x (05)

i and x (25)i the estimates ca and p-values for each of the

samples are reported in Table 5 The results indicate that there are substantialdifferences between the traditional p-values px 2 and the bootstrap p-values pBespecially when RMSEA = 10 for the downweighted samples It is obvious that TMLwhen RMSEA = 10 has a much shorter right tail than that of x 2

24(dc) This fact is alsoreected in the corresponding ca s Table 5 once again illustrates the fact that given aclose t the behaviour of TML cannot be described by a non-central chi-square unlessthe corresponding TML under H0 in (1) has a heavy right tail As when testing for exactt using a chi-square table to judge the signicance of a statistic in testing model (17) forclose t is misleading

Example 2 Neumann (1994) presented an alcohol and psychological symptomdata set consisting of 10 variables and 335 cases The two variables in x = (x 1 x 2) cent arerespectively family history of psychopathology and family history of alcoholism whichare indicators for a latent construct of family history The eight variables iny = (y1 y8) cent are respectively the age of rst problem with alcohol age of rstdetoxication from alcohol alcohol severity score alcohol use inventory SCL-90psychological inventory the sum of the Minnesota Multiphasic Personality Inventoryscores the lowest level of psychosocial functioning during the past year and the highestlevel of psychosocial functioning during the past year With two indicators for eachlatent construct these eight variables respectively measure age of onset alcoholsymptoms psychopathology symptoms and global functioning Neumannrsquos (1994)theoretical model for this data set is

x = Lx y + d y = L y h + e (18a)

h = B h + Gy + z (18b)

where

Lx =10

l 1

Aacute Ly =

1 l 2 0 0 0 0 0 0

0 0 1 l 3 0 0 0 0

0 0 0 0 1 l 4 0 0

0 0 0 0 0 0 1 l 5

0

BBBBB

1

CCCCCA

cent

(18c)

B =

0 0 0 0b 21 0 0 0

b 31 b 32 0 00 b42 b43 0

0

BBB

1

CCCA G =

g11

0

00

0

BBB

1

CCCA F = Var(y) (18d )

and e d and z are vectors of errors whose elements are all uncorrelated The modeldegrees of freedom are 29

With Mardiarsquos multivariate kurtosis equal to 1476 the data set may come from adistribution with heavy tails Abootstrap procedure is more appropriate after the heavytails are properly downweighted Actually Yuan et al (2001) found that the samplex (10)

i leads to the most efcient parameter estimates in (18) among various proceduresHere we apply the three statistics to the two samples x i and x (10)

i Our purpose is to

104 Ke-Hai Yuan and Kentaro Hayashi

explore the pivotal property of TML TSB and TB on each sample After noticing that alarge portion of the QQ plot for TML applying to x (10)

i is below the x = y line ouranalysis also includes the sample x (05)

i The QQplots of the three statistics applied to x i indicate that none of them is nearly

pivotal TML and TSB are for the most part above the x = y line however both their lefttails are slightlybelow the x = y line This implies that some bootstrap samples t model(18) extremely well a phenomenon which can be caused by too many data points nearthe centre of the distribution Adownweighting procedure only affects data points thatcause a test statistic to possess a heavy right tail It is not clear to us how to deal with asample when a test statistic has a light left tail The QQplots of TML and TSB on x (05)

i andx (10)

i suggest that their heavier right tails on x i are under control However the right tailof TB is still quite heavy when compared to x 2

29 The downweighting transformationH(10) not only just controls the right tail of TML but also makes most of the QQplot fallbelow the x = y line Visually inspecting the QQ plots of TML on x i x (05)

i and x (10)i

suggests that a downweighting transformation by H(r) with 0 lt r lt 05 may lead toa better procedure for analysing the alcohol and psychological symptom data setbased on TML Since any procedure is only an approximation to the real world andTML is nearly pivotal when applying to x (05)

i we recommend the analysis using TMLon x (05)

i for this data set This leads to TML = 4243 with a bootstrap p-value of 040implying that the model (18) marginally ts the alcohol and psychological symptomdata set

Both the raw data sets in Examples 1 and 2 have signicant multivariate kurtoses andthe downweighting transformation (15) achieves approximately pivotal behaviour ofthe statistics We maywonder how these test statistics behave when applied to a data setthat does not have a signicant multivariate kurtosis This is demonstrated in thefollowing example

Example 3 Table 121 of Mardia Kent and Bibby (1979) contains test scores ofn = 88 students on p = 5 topics mechanics vectors algebra analysis and statisticsThe rst two topics were tested with closed-book exams and the last three withopen-book exams Since these two examination methods may tap different abilities atwo-factor model as in (17a) with

L =10 l 21 0 0 00 0 10 l 42 l 52

sup3 acutecent F =

f11 f12

f 21 f 22

sup3 acute(19)

was proposed and conrmed by Tanaka Watadani and Moon (1991) Mardiarsquos multi-variate kurtosis for this data set equal to 0057 is not signicant It would be interestingto see how TML TB and TSB perform on this data set

QQ plots for the three statistics applied to x i indicate that all of them have heavierright tails than that of x 2

4 The QQ plot for TML applied to x (r)i continues to exhibit a

heavier right tail until r reaches 30 TSB and TB still possess quite heavy right tails evenwhen r = 30

This data set has been used as an example for inuence analysis Previous studiesindicate that case number 81 is the most inuential point (Lee amp Wang 1996 Fung ampKwan 1995) So it is enticing to apply the three statistics to the x i s without the 81stcase The QQplots of TML TSB and TB applied to the remaining 87 cases indicate that allthe three statistics still have heavier right tails than that of x 2

4 Actually compared tothose based on all 88 cases the right tails of the three statistics on the 87 cases are evenheavier For this data set our recommended analysis is to use TML applied to x (30)

i With

Bootstrap approach to inference and power analysis 105

TML = 189 and a bootstrap p-value of 71 model (19) is more than good enough inexplaining the relationship of the ve variables

6 Non-convergence with Bootstrap ReplicationsNon-convergence issues exist with resampling and caution is needed for bootstrapinference (Ichikawa amp Konishi 1995) This generally happens when the sample size isnot large enough and especially when a model structure is wrong This issue wasdiscussed in Yuan and Bentler (1997) with Monte Carlo studies on a covariancestructure model Here we propose a reasonable way of handling non-convergencewith bootstrap replications

With an iterative algorithm like Newtonrsquos convergence is generally dened ask Dv ( j ) k lt e where e is a small number and Dv ( j ) = v ( j ) plusmn v ( j plusmn 1) with v ( j ) being thej th step solution Let s = vech(S) be the factor formed by stacking the non-duplicatedelements of S and let its sample counterpart be s = vech(S) Then Dv ( j + 1) =

( Ccedils centj Wj Ccedilsj)plusmn 1 Ccedils centj Wj (s plusmn sj) where Ccedilsj = [ds(v)dv | v ( j )] sj = s(v ( j )) and Wj is the

corresponding weight matrix evaluated at v ( j ) So Dv ( j ) is proportional toS plusmn S(v ( j plusmn 1)) and it is impossible for k Dv ( j ) k to be smaller than e if S(v) is far fromE(S) Although a model is correct in a bootstrap population F0 some bootstrapreplications may still lie far away from the model especially when sampling errors arelarge for small samples For these samples even if we can obtain a solution using somenon-iterative or direct search method the corresponding statistic will be signicantBased on this fact we should distinguish two kinds of non-convergence in bootstrap orgeneral Monte Carlo studies The rst is where a sample contains enough distinct pointsand still cannot reach convergence with a model which should be treated as asignicant replication or a lsquobad modelrsquo The second is where a sample does not containenough distinct observations to t a model For obtaining a TML this number is p + 1 thenumber is p + 1 for obtaining a TB Although a TSB can be obtained once a TML isavailable we need to have a positive denite sample covariance matrix S y ofy i = vech[(x plusmn x )(x plusmn x ) cent ] in order for TSB to make sense (Bentler amp Yuan 1999)and the minimum number of distinct data points for S y to be positive denite is p + 1We should treat the second case as a bad sample and ignore it in bootstrap replications

In practice instead of estimating ca one generally reports the p-value of a statistic TIf all B bootstrap replications result in convergent solutions

pB =B0 + 1 plusmn M

B0 + 1

where M = fTb | Tb gt Tg When non-convergences occur B0 should be dened as thenumber of converged samples plus the number of signicant samples due to a badmodel M should be dened as the number of non-converged samples due to the badmodel plus the number of converged samples that result in Tb gt T With this modi-cation there is no problem with hypothesis testing A similar modication applies toformula (8) for power evaluation With ca = T(B0 (1 plusmn a)) in determining the numeratorin (8) one needs at least B0(1 plusmn a) converged samples when sampling from F0 The proposed procedure fails to generate a power estimate when the number ofconvergences under H0 is below B0(1 plusmn a)

Let l 1(S x ) $ l 2(S x ) $ $ l p(S x ) be the eigenvalues of the sample covariancematrix S x and g1(S y ) $ g 2(S y ) $ $ gp (S y ) be the eigenvalues of the sample

106 Ke-Hai Yuan and Kentaro Hayashi

covariance matrix S y of y i = vech[(x plusmn x )(x plusmn x ) cent ] In computing the three examplesour criteria for bad samples are l p(S x )l 1(S x ) 10plusmn 20 for obtaining TML andgp (S y )g1(S y ) 10plusmn 20 for obtaining TSB and TB If these criteria are satised and themodel still cannot reach convergence (k Dv k lt 10plusmn 5) in 100 iterations we treat thecorresponding statistic as innity All non-convergences with the three examples weredue to signicant replications these are reported in Table 6 Table 6(a) implies that for agiven sample the further a model is from H0 the more often non-convergences occurThe table also suggests that with a given model the heavier the tails of a data set themore often one obtains non-convergences

7 Discussion and conclusionsExisting conclusions regarding the three commonly used statistics TML TSB and TB arebased on asymptotics and Monte Carlo studies The properties inherent in either ofthese approaches may not be enjoyed by these statistics in practice because of either anite sample or an unknown sampling distribution By applying them to a specic data

Bootstrap approach to inference and power analysis 107

Table 6 Non-convergences due to signicant samples

(a) Example 1

Sample x i Sample x (05)i Sample x (25)

i

Statistic H0 Ha1 Ha2 H0 Ha1 Ha2 H0 Ha1 Ha2

TML 0 0 2 0 0 1 0 0 1TSB 0 0 2 0 0 1 0 0 1TB 0 10 21 0 10 27 0 7 20

(b) Example 2

Sample x i Sample x (05)i Sample x (10)

iStatistic H0 H0 H0

TML 9 1 1TSB 9 1 1TB 1 0 0

(c) Example 3

Sample x i Sample x (30)i Sample x i

(81st case removed)Statistic H0 H0 H0

TML 1 0 0TSB 1 0 0TB 0 0 0

set through resampling properties of the three statistics can be visually examined bymeans of QQplots When a data set possesses heavy tails TML will inherit these tails byhaving a heavy right tail Actually all three statistics need the sampling distribution tohave nite fourth-order moments With possible violation of this assumption bypractical data we propose applying a bootstrap procedure to a transformed samplex (r)

i through downweighting The combination of bootstrapping and downweightingnot only offers a theoretical justication for applying the bootstrap to a data set withheavy tails but also provides a quite exible tool for exploring the properties of each ofthe three statistics Even if inference is based on referring a statistic to a chi-squaredistribution one will obtain a more accurate model evaluation by applying TML to atransformed sample x (r)

i Asymptotics justify the three statistics from different perspectives and TB is

asymptotically distribution-free Previous Monte Carlo studies mainly support TSBHowever with proper downweighting TML is generally the one that is best describedby a chi-square distribution However the conclusion that TML is the best statistic forbootstrap inference has to be preliminary Future studies may nd TB or TSB moreappropriate for other data sets We recommend exploring the different procedures for agiven data set as illustrated in Section 5

AcknowledgementsThe authors would like to thank Professor Peter M Bentler and two anonymous referees whoseconstructive comments lead to an improved version of this paper Correspondence concerningthis article should be addressed to Ke-Hai Yuan (kyuanndedu)

ReferencesAmemiya Y amp Anderson T W (1990) Asymptotic chi-square tests for a large class of factor

analysis models Annals of Statistics 18 1453ndash1463Barndorff-Nielsen O E amp Cox D R (1984) Bartlett adjustments to the likelihood ratio statistic

and the distribution of the maximum likelihood estimator Journal of the Royal StatisticalSociety B 46 483ndash495

Bentler P M (1995) EQS Structural equations program manual Encino CA MultivariateSoftware

Bentler P M amp Yuan K-H (1999) Structural equation modeling with small samples Teststatistics Multivariate Behavioral Research 34 181ndash197

Beran R (1986) Simulated power functions Annals of Statistics 14 151ndash173Beran R (1988) Prepivoting test statistics Abootstrap view of asymptotic renements Journal

of the American Statistical Association 83 687ndash697Beran R amp Srivastava M S (1985) Bootstrap tests and condence regions for functions of a

covariance matrix Annals of Statistics 13 95ndash115Bollen K A amp Stine R (1993) Bootstrapping goodness of t measures in structural equation

models In K A BOllen and J S Long (Eds) Testing structural equation models(pp 111ndash135) Newbury Park CA Sage

Browne M W (1984) Asymptotic distribution-free methods for the analysis of covariancestructures British Journal of Mathematical and Statistical Psychology 37 62ndash83

Browne M W amp Shapiro A (1988) Robustness of normal theory methods in the analysis oflinear latent variate models British Journal of Mathematical and Statistical Psychology 41193ndash208

108 Ke-Hai Yuan and Kentaro Hayashi

Cohen J (1988) Statistical power analysis for the behavioral sciences (2nd ed) Hillsdale NJErlbaum

Davison A C amp Hinkley D V (1997) Bootstrap methods and their application CambridgeCambridge University Press

Efron B ampTibshirani R J (1993) An introduction to the bootstrap New York Chapman ampHallFouladi R T (1998) Covariance structure analysis techniques under conditions of multi-

variate normality and nonnormality Modied and bootstrap based test statistics Paperpresented at the annual meeting of the American Educational Research Association San DiegoCA

Fouladi R T (2000) Performance of modied test statistics in covariance and correlationstructure analysis under conditions of multivariate nonnormality Structural EquationModeling 7 356ndash410

Fung W K ampKwan C W (1995) Sensitivityanalysis in factor analysis Difference between usingcovariance and correlation matrices Psychometrika 60 607ndash614

Hall P (1992) The bootstrap and Edgeworth expansion New York Springer-VerlagHolzinger K J amp Swineford F (1939) A study in factor analysis The stability of a bi-factor

solution University of Chicago Press Supplementary Educational Monographs No 48Hu L T Bentler P M amp Kano Y (1992) Can test statistics in covariance structure analysis be

trusted Psychological Bulletin 112 351ndash362Huber P J (1981) Robust statistics New York WileyIchikawa M amp Konishi S (1995) Application of the bootstrap methods in factor analysis

Psychometrika 60 77ndash93Lawley D N amp Maxwell A E (1971) Factor analysis as a statistical method (2nd ed) New

York American ElsevierLee S Y ampWang S J (1996) Sensitivityanalysis of structural equation models Psychometrika

61 93ndash108MacCallum R C Browne M W ampSugawara H M (1996) Power analysis and determination of

sample size for covariance structure modeling Psychological Methods 1 130ndash149Mardia K V (1970) Measure of multivariate skewness and kurtosis with applications

Biometrika 57 519ndash530Mardia K V Kent J T amp Bibby J M (1979) Multivariate analysis New York Academic PressMaronna R A (1976) Robust M-estimators of multivariate location and scatter Annals of

Statistics 4 51ndash67Mooijaart A ampBentler P M (1991) Robustness of normal theory statistics in structural equation

models Statistica Neerlandica 45 159ndash171Neumann C S (1994) Structural equation modeling of symptoms of alcoholism and

psychopathology Doctoral dissertation University of Kansas LawrenceSatorra A amp Bentler P M (1988) Scaling corrections for chi-square statistics in covariance

structure analysis In 1988 Proceedings of Business and Economics Sections (pp 308ndash313)Alexandria VA American Statistical Association

Satorra A amp Bentler P M (1990) Model conditions for asymptotic robustness in the analysis oflinear relations Computational Statistics amp Data Analysis 10 235ndash249

Satorra A ampSaris W E (1985) Power of the likelihood ratio test in covariance structure analysisPsychometrika 50 83ndash90

Satorra A Saris W E amp de Pijper W M (1991) Acomparison of several approximations to thepower function of the likelihood ratio test in covariance structure analysis StatisticaNeerlandica 45 173ndash185

Shapiro A (1987) Robustness properties of the MDF analysis of moment structures SouthAfrican Statistical Journal 21 39ndash62

Steiger J H amp Lind J M (1980) Statistically based tests for the number of common factorsPaper presented at the annual meeting of the Psychometric Society Iowa City

Steiger J H Shapiro A amp Browne M W (1985) On the multivariate asymptotic distribution ofsequential chi-square statistics Psychometrika 50 253ndash264

Bootstrap approach to inference and power analysis 109

Tanaka Y Watadani S amp Moon S H (1991) Inuence in covariance structure analysis with anapplication to conrmatory factor analysis Communications in Statistics Theory andMethods 20 3805ndash3821

Tyler D E (1983) Robustness and efciency properties of scatter matrices Biometrika 70411ndash420

Wakaki H Eguchi S amp Fujikoshi Y (1990) A class of tests for a general covariance structureJournal of Multivariate Analysis 32 313ndash325

Yuan K-H amp Bentler P M (1997) Mean and covariance structure analysis Theoretical andpractical improvements Journal of the American Statistical Association 92 767ndash774

Yuan K-H amp Bentler P M (1998) Structural equation modeling with robust covariancesIn A E Raftery (ed) Sociological methodology 1998 (pp 363ndash396) Boston BlackwellPublishers

Yuan K-H amp Bentler P M (1999) On normal theory and associated test statistics in covariancestructure analysis under two classes of nonnormal distributions Statistica Sinica 9 831ndash853

Yuan K-H Chan W amp Bentler P M (2000) Robust transformation with applications tostructural equation modelling British Journal of Mathematical and Statistical Psychology53 31ndash50

Yuan K-H Bentler P M amp Chan W (2001) Structural equation modeling with heavy taileddistributions Manuscript submitted for publication

Yuan K-H amp Hayashi K (2001) On using an empirical Bayes covariance matrix in bootstrapapproach to covariance structure analysis Manuscript submitted for publication

Yung Y F amp Bentler P M (1994) Bootstrap-corrected ADF test statistics in covariance structureanalysis British Journal of Mathematical and Statistical Psychology 47 63ndash84

Yung Y F amp Bentler P M (1996) Bootstrapping techniques in analysis of mean and covariancestructures In G A Marcoulides and R E Schumacker (Eds) Advanced structural equationmodeling Techniques and issues (pp 195ndash226) Hillsdale NJ Lawrence Erlbaum

Zhang J amp Boos D D (1992) Bootstrap critical values for testing homogeneity of covariancematrices Journal of the American Statistical Association 87 425ndash429

Received 9 November 2000 revised version received 22 March 2002

110 Ke-Hai Yuan and Kentaro Hayashi

Page 4: Bootstrap approach to inference and power analysis based on three test statistics for covariance structure models

should be understood in the following sense when S a is the population covariancematrix of Fx that generates the observed sample x 1 x n then ba in (7) is consistentfor the probability ba in (6) and the latter is the exact power measured by the cdf Fx

In contrast to (5) the S a in (7) cannot be replaced by a consistent estimate Forexample we will not obtain a consistent estimate of power for T to reject the nullhypothesis H0 under the true covariance matrix S a = Cov(x i) when replacing S a by thesample covariance matrix S x of the x i s The inconsistency here can be well understoodthrough TML Actually with S a being O(1

n

p) apart from S0 the bootstrap approach to

power is asymptotically equivalent to the approach developed in Satorra and Saris(1985) for normally distributed data (see also Steiger Shapiro amp Browne 1985) Theseauthors used a non-central chi-square distribution to describe the behaviour of TMLunder H1 with a non-centrality parameter (NCP)

d = n minv

D(S a S(v)) (9)

where D = DML is the Wishart likelihood discrepancy function When replacing S a by aconsistent estimate S a even though S a = Sa + Op(1

n

p) the NCP estimate based on

Sa can be Op(1) apart from that based on S a This limitation of power evaluation alsoexists in other classical procedures for power analysis (eg Cohen 1988) The fact isthat without knowledge of the true effect size one cannot consistently estimate thepower of a statistic in rejecting the null

Although the bootstrap approach to power analysis is equivalent to the approachproposed by Satorra and Saris (1985) for normally distributed data when S a is O(1

n

p)

apart from S(v) there are fundamental differences between the two approaches Withthe bootstrap approach there is neither a central chi-square distribution when H0 holdsnor a non-central chi-square distribution when H1 holds Instead the distribution of T isjudged by its empirical behaviour based on resampling from observed data We mayregard l = minv D(Sa S(v)) as a measure of effect size For a given sample size thegreater l is the higher the power for a bootstrap test to distinguish between H0 and H1Because there is no chi-square distribution assumption with the bootstrap approachpower analysis cannot resort to a chi-square table and has to be evaluated separately fordifferent samples This tailor-made type of analysis can be regarded as an advantage ofthe bootstrap methodology

23 Determination of sample sizeLet x 1 x n be a pilot sample Under H1 in (2) we want to nd the smallest samplesize for a statistic T to reject H0 with probability b0 For this purpose we rst draw B0independent samples b (m) = (y (b)

1 y (b)m ) of size m from F0 to estimate the critical

value ca(m) as in Section 21 We then draw B1 independent samples Zb (m) Fz of sizem where the covariance matrix of z Fz is S a Evaluating Tb (m) at each sampleZb (m) as in (8) the estimated power for sample size m is

ba(m) = fTb (m) gt ca (m)g B1

Asmaller sample size m1 is needed if ba(m) gt b0 otherwise a greater sample size m 2 isneeded Finding the minimum m such that ba (m) $ b0 may require a series of trialsThe interval-halving procedure in the Appendix of MacCallum Browne and Sugawara(1996) who studied sample-size determination using non-central chi-squares can beequally applied here More discussion on the bootstrap approach to sample size and

96 Ke-Hai Yuan and Kentaro Hayashi

power can be found in Beran (1986) Efron and Tibshirani (1993 Chapter 25) andDavison and Hinkley (1997 Section 46)

24 Test for close tA covariance structure model is at best only an approximation to the real world Anyinteresting model will be rejected when the sample size is large enough Based on theroot-mean-square error of approximation (RMSEA) (Steiger amp Lind 1980) MacCallumet al (1996) proposed testing for close t rather than exact t in (1) Using a non-centralchi-square to describe the behaviour of TML their approach is equivalent to testing if theNCP is less than a prespecied value The test for close t can also be easilyimplemented with the bootstrap Let S c be a covariance matrix with close t such that

dc = n minv

D(Sc S(v)) (10)

where dc is the lsquonon-centrality parameterrsquo in equation (8) of MacCallum et al (1996)Such a covariance matrix can be found in the form of Sh = hS x + (1 plusmn h)S(v) WhenD = DML is the Wishart likelihood discrepancy function Yuan and Hayashi(2001) showed that minv D(Sh S(v)) = D(Sh S(v)) is a strictly increasing function ofh [ [0 1] and it is straightforward to nd a S c once a dc is given Let y i = S 12

c Splusmn 12x x i

i = 1 n then the covariance matrix of y Fy is S c Let Tb b = 1 B0 be thecorresponding statistics of T evaluated respectivelyat B0 independent samples from Fy Then ca = T(B0 (1 plusmn a)) is the critical value estimate of the test for close t and the modelwill be rejected when T = T(x 1 x n) gt ca It is also straightforward to performpower analysis with the bootstrap when a less-close-t covariance matrix describes thepopulation

3 Pivotal property of TML TSB and TB

Although it is unclear which of the statistics TML TSB and TB is best for bootstrappingwith a specic data set there is a general result for choosing a statistic (Beran 1988Hall 1992) In the following we will discuss this result and its relevance to CSA

A statistic is pivotal if its distribution does not depend on any parameters in theunderlying sampling distribution and it is asymptotically pivotal if its asymptoticdistribution does not depend on unknown parameters Let a statistic T = f (S x ) be asmooth function of the sample covariance matrix S x of x 1 x n Suppose T isasymptotically pivotal and an Edgeworth expansion applies to its distribution functionunder H0 (Barndorff-Nielsen amp Cox 1984 Wakaki Eguchi amp Fujikoshi 1990)

P(T t | F0) = G(t) + n plusmn 1g(t) + O(nplusmn 32) (11a)

where G(t) is the asymptotic distribution function of T and g(t ) is a smooth functionthat depends on some unknown population parameters As discussed in Hall (1992) andDavision and Hinkley (1997 Section 261) (11a) generally holds for smooth functionsof sample moments Let T be the corresponding statistic based on resampling from thecorresponding F0 Its Edgeworth expansion is

P(T lt t | F0) = G(t) + n plusmn 1 g(t ) + Op(nplusmn 32) (11b)

Since g(t) = g(t ) + Op(nplusmn 12) in general

P(T t | F0) plusmn P(T t | F0) = Op(nplusmn 32) (12)

Bootstrap approach to inference and power analysis 97

Suppose we know G then asymptotic inference is based on comparing the observedstatistic T = T(x 1 x n) to the critical value Gplusmn 1(1 plusmn a) Equation (11a) implies thatasymptotic inference can achieve a designated level a to the order of accuracy ofOp(1n) According to (12) bootstrap inference based on an asymptotically pivotalstatistic can achieve the level a to a higher order of accuracy When T is notasymptotically pivotal G(t) in (11) depends on some unknown parameters One hasto estimate G(t) by G(t) for asymptotic inference Similarly with the bootstrap one hasto implicitlyestimate the unknown parameters in G(t ) and consequently it achieves thesame order of accuracy to the level a as that based on G(t) So for more accurateinference we should choose a statistic that is as nearly pivotal as possible Beran (1988)gave a nice discussion on using a pivotal statistic for bootstrap and the order of accuracyfor Type I errors

When data are normallydistributed (TML| H0) is asymptoticallydistributed as x 2p plusmn q

where p = p( p + 1)2 and q is the number of unknown parameters in S(v) Conse-quently TML is asymptotically pivotal (Davison amp Hinkley 1997 p 139) More generalconditions also exist for TML to be asymptotically pivotal with some specic models(Amemiya amp Anderson 1990 Browne amp Shapiro 1988 Mooijaart amp Bentler 1991Satorra ampBentler 1990 Shapiro 1987 Yuan ampBentler 1999) Unfortunately there is noeffective way of verifying these conditions in practice The statistic TSB is obtained byrescaling TML using fourth-order sample moments (Satorra amp Bentler 1988) Within theclass of elliptical or pseudo-elliptical distributions with nite fourth-order moments(TSB | H0) approaches x 2

p plusmn q (Yuan amp Bentler 1999) So TSB is asymptotically pivotalwhen the sampling distribution is elliptical or pseudo-elliptical It is not asymptoti-cally pivotal for other types of non-normal distribution In contrast to TML and TSB TBis asymptotically pivotal for any sampling distribution with nite fourth-ordermoments

The above discussion may imply that TB is the preferred statistic for bootstrapinference in CSA However there exist conditions that may hide the pivotal property ofTB with nite samples In addition to the asymptotically pivotal property Davison andHinkley (1997 Section 251) discussed nearly pivotal properties of a statistic T whichrequires the distribution of T to approximately follow the same distribution whensampling from Fx or Fx Since the target function G(t) for the three test statistics is thecdf of x 2

p plusmn q the better a statistic is approximated by a chi-square distribution the moreaccurate its bootstrap inference is in controlling Type I errors Because sample size is aserious issue with CSA a statistic that is asymptoticallypivotal may not be nearlypivotalas will be shown in Section 5

4 Heavy tails with practical dataBootstrap inference based on TML TB or TSB requires Fx to have nite fourth-ordermoments Of course the sample fourth-order moments of any sample will always benite However if some of the fourth-order moments (eg kurtoses) are quite large thecorresponding population fourth-order moments may actuallybe innite Even when allthe population fourth-order moments are nite inference based on the bootstrap willnot be accurate when Fx possesses heavy tails Here we propose applying the bootstrapprocedure to a sample in which cases contributing to heavy tails are properly down-weighted Following the study in Yuan Chan and Bentler (2000) we only use theHuber-type weights (Huber 1981 Tyler 1983) in controlling those cases

98 Ke-Hai Yuan and Kentaro Hayashi

For a p-variate sample x 1 x n let

d i = d (x i m S) = [(x i plusmn m ) cent Splusmn 1(x i plusmn m )]12

be the Mahalanobis distance and u 1(t ) and u 2(t) be non-negative scalar functions Arobust M-estimator ( ˆm S) can be obtained by solving (Maronna 1976)

m =Xn

i = 1

u 1(di )x i

iquest Xn

i = 1

u1(di ) (13a)

S =Xn

i = 1

u 2(d 2i )(x i plusmn m )(x i plusmn m ) cent n (13b)

With decreasing functions u 1(t) and u 2(t) cases having greater di s receive smallerweights and thus their effects are controlled Let r represent the proportion ofoutlying cases one wants to control and r be a constant dened by P(x 2

p gt r 2) = rThe Huber-type weights corresponding to this r are

u 1(d ) =1 if d r

rd if d gt r

raquo(14)

and u 2(d 2) = fu 1(d )g 2J where J is a constant such that Efx 2p u 2(x 2

p)g = p whichmakes the estimate S unbiased if Fx = Np( m S) We may regard r as a tuning parameterwhich can be adjusted according to the tails of a specic data set Applying differenttypes of weights to several data sets Yuan Bentler and Chan (2001) found that the mostefcient parameter estimates often go with Huber-type weights

Let ( ˆm S) be the converged solution to (13) u2 i = u 2fd 2(x i ˆm S)g and

x (r)i = f

u 2 i

p(x i plusmn ˆm )g (15)

Then we can rewrite (13b) as

S =Xn

i = 1

x (r)i x (r)

i cent n

which is just the sample covariance matrix of the x (r)i Yuan et al (2000) proposed using

(15) as a downweighting transformation formula Working with several practical datasets they found that the transformed sample x (r)

i is much better approximated by anormal distribution than the original sample x i As we shall see when applying thebootstrap to a sample x (r)

i the test statistic TML is approximately pivotal For x (r)i we

have

E(x (r)i 1 x (r)

i 2 x (r)i 3 x (r)

i 4 ) = E[u 22(d 2

i )(x i 1 plusmn m1)(x i 2 plusmn m2)(x i 3 plusmn m 3)(x i 4 plusmn m4)] + o(1) (16)

where x (r)i j is the j th element of x (r)

i Because the denominator of u 2(d 2i ) in (14)

contains d 2i when d 2

i gt r 2 (16) is bounded Even when the fourth-order moments of Fx

do not exist the corresponding fourth-order moments of the x (r)i are still nite

When interest lies in the structure of the population covariance matrix S = Cov(x i)one needs to know whether transformation (15) changes the structure of S Let S (r) bethe population covariance matrix of x (r)

i then S (r) = kS when the sampling distribu-tion is elliptically symmetric For almost all the commonly used models in the socialsciences modelling S is equivalent to modelling S (r) (Browne 1984) Yuan et al (2000)further illustrated that outlying cases create signicant multivariate skewness and thatthe sample covariance matrix S x is biased in such a situation Analysis based on x (r)

i can

Bootstrap approach to inference and power analysis 99

successfully remove the bias in S x and recover the correct model structure More detailson the merits of robust approaches to CSAcan be found in Yuan and Bentler (1998) andYuan et al (2000)

5 ApplicationsOur purpose in this section is to compare the performance of TML TB and TSB using realdata We will show that by combining the bootstrap and a downweighting transforma-tion a nearly optimal procedure may be found for analysing a given data set If a teststatistic T is nearly pivotal then its evaluations Tb at the B0 bootstrap replicationsshould approximately follow a chi-square distribution There are a variety of tools toevaluate this property We favour the quantilendashquantile (QQ) plot because of itsvisualization value If T follows x 2

d the plot of ordered Tb s against the quantiles ofx 2

d should form an approximately straight line with slope2

p2 Asignicant departure

from this line indicates violations of the pivotal property To save space the QQplots arepresented in a document available on the Internet (httpwwwndedu kyuanpapersbsinfpowgpdf ) We choose B0 = B1 = B = 1000 in estimating ca and ba although a smaller B may be enough in obtaining these estimates (Efron amp Tibshirani1993)

Example 1 Holzinger and Swineford (1939) provide a data set consisting of 24cognitive variables on 145 subjects We use nine of the variables in our study visualperception cubes and lozenges measuring a spatial ability factor paragraph compre-hension sentence completion and word meaning measuring a verbal ability factoraddition counting dots and straight-curved capitals measuring (via the imposition of atime limit) a speed factor Let x represent the nine variables then the conrmatoryfactor model

x = L f + e Cov(x ) = LFL cent + W (17a)

with

L =

10 l 21 l 31 0 0 0 0 0 0

0 0 0 10 l 52 l 62 0 0 0

0 0 0 0 0 0 10 l 83 l 93

0

BB

1

CCA

cent

F =

f11 f12 f13

f 21 f22 f 23

f 31 f32 f 33

0

BB

1

CCA

(17b)

represents Holzinger and Swinefordrsquos hypothesis We assume that the measurementerrors are uncorrelated with W = Cov(e ) being a diagonal matrix There are q = 21unknown parameters and the model degrees of freedom are 24

Mardiarsquos (1970) multivariate kurtosis for the nine-variable model is 304 implyingthat the data may come from a distribution with heavier tails than those of a normaldistribution We therefore apply the downweighting transformation (15) in order forTML to be approximately pivotal Based on previous research (Yuan et al 20002001) we apply the three statistics to three samples the raw sample x i and thetransformed samples x (05)

i and x (25)i The following bootstrap procedures include

further transforming these samples to satisfy H0 or H1 as given in Section 2

100 Ke-Hai Yuan and Kentaro Hayashi

QQ plots for the three statistics on the raw data set suggest that no statistic isapproximately pivotal All have heavier right tails than that of x 2

24 This is expected forTML because the sample x i exhibits heavier tails than those of a normal distributionPrevious studies suggest that a large sample size is needed for TB to behave like achi-square random variable Asample size of n = 145 may not be large enough or theremay be other unknown factors that prevent TB from behaving like a chi-square variableEven though many simulation studies recommend TSB this statistic certainly has aheavier right tail than that of x 2

24 for this data setWhen applied to the transformed sample x (05)

i all the three statistics still haveheavier right tails than that of x 2

24 When applied to x (25)i the QQplots indicate that TML

is approximately described by x 224 but TB or TSB are not Consequently the suggested

procedure for this data set is to apply TML to the transformed sample x (25)i Boot-

strapping with TML on this sample not only leads to more efcient parameter estimates(Yuan et al 2001) but also provides a more accurate Type I error estimate

Table 1 gives the signicance levels of model (17) evaluated by various proceduresSeveral differences exist among these procedures First all the bootstrap p-values ( pB)

are greater than those ( px 2 ) obtained by referring the test statistics to x 224 The only

comparable pair of pB and px 2 is when TML is applied to the sample x (25)i Second the TB

statistic for each sample is the largest among the three statistics but the pB for TB is alsothe largest for any of the three samples This is in contrast to conclusions regarding theperformance of TB when referring to a chi-square distribution (Hu et al 1992 Fouladi2000 Yuan amp Bentler 1997) Third all the pBs based on any statistic with any of thesamples are quite comparable implying the robustness of bootstrap inference Theabove phenomena can be further explained by the critical values c 05 in Table 2 All thec 05 s are greater than x 2

24(95)plusmn 1 = 36415 the 95th percentile of x 224 When the heavier

tails are downweighted in the sample the T s corresponding to each of the threestatistics behave more like x 2

24 and the corresponding c 05 is nearer 36415 Although TBis the largest among the three statistics on anyof the samples the corresponding c 05 forTB is also the largest explaining whythe associated pBs for TB can also be the largest Onthe other hand the traditional inference procedure is to refer TB to x 2

24 for signicanceWith such a xed reference distribution that does not take the increased variabilityof TBinto account a larger TB generally corresponds to a smaller px 2

The power properties of TML TB and TSB are evaluated at two alternatives Startingwith model (17) the parameter l 91 is the only signicant path identied by theLagrange multiplier test in EQS (Bentler 1995) Adding this extra parameter to (17b)the rst alternative Sa1 is the estimated covariance matrix obtained by tting this model

Bootstrap approach to inference and power analysis 101

Table 1 Statistics bootstrap p-values ( pB) and p-values ( px 2 ) referring to x 224

Sample x i Sample x (05)i Sample x (25)

i

Statistic pB px 2 pB p x 2 pB p x 2

TML 51187 017 001 49408 010 002 48883 004 002TSB 48696 020 002 50318 013 001 53674 003 000TB 56726 055 000 62095 024 000 66910 014 000

to the raw sample x i by minimizing DML The second alternative Sa 2 is the samplecovariance matrix of the raw sample With these two alternatives the NCPs in (9) for thethree statistics are given in Table 3 Notice that once a Sa is given the d in (9)corresponding to TML is xed and does not depend on the sample Because both TSB andTB involve fourth-order sample moments their corresponding ds change as the tails ofthe data set change QQplots for the ordered T s against quantiles of the x 2

24(d) indicatethat the distributions of TML and TSB applied to x i are well approximated by thecorresponding non-central chi-squares Because TML under H0 applied to x (25)

i is wellapproximated by x 2

24 we would expect its distributions under the two alternatives alsoto be well approximated by x 2

24(2260) and x 224(5119) respectively This expectation is

not fullled as judged from the corresponding QQplots The reason for this is that evenfor perfectly normal data simulated from Np( m S) the statistic TML will not behave like anon-central chi-square unless d is quite small and n is large Because the NCP tends to beoverestimated when it is large (Satorra Saris amp de Pijper 1991) the corresponding QQplots are below the x = y line

Can we trust the results of power analysis in the traditional approach when TML andTSB are well approximated by non-central chi-square distributions in this example Theanswer is no This is because the critical value x 2

24(95)plusmn 1 = 36415 which does not takethe actual variabilityof TMLor TSB into account is not a good estimate of ca in (3) As canbe seen from Table 2 there exist quite substantial differences between x 2

24(95)plusmn 1 andthe c05 s Actually power analysis based on a non-central chi-square table is not reliablefor even perfectly normal data when S(v) is not near S a Table 4 contrasts the powerestimates based on the bootstrap (ba ) with those based on the traditional approach (ba)

using NCPs in Table 3 The ba s are uniformly smaller than the ba s especially when theNCPs are not huge When an NCP is large enough there is almost no overlap betweenthe distribution of (T | H0) and that of (T | H1) Any sensible inference procedure can tellthe difference between the two and consequently the power estimates under Ha2 inTable 4 are approximately the same

102 Ke-Hai Yuan and Kentaro Hayashi

Table 2 Bootstrap critical values c 05

Statistic Sample x i Sample x (05)i Sample x (25)

i

TML 42348 39991 37022TSB 41694 41067 40587TB 57930 55120 53925

Table 3 Non-centrality parameter estimates associated with each statistic and sample

Sample x i Sample x (05)i Sample x (25)

i

Statistic Ha1 Ha2 Ha1 Ha2 Ha1 Ha2

TML 22603 51187 22603 51187 22603 51187TSB 21763 48695 23107 51901 24875 55948TB 20289 56726 20130 63187 20683 69516

Table 4 illustrates the power properties of each statistic when the data changeWhen the outlying cases are downweighted not only do the parameter estimatesbecome more efcient (Yuan et al 2001) but also the powers for identifying anincorrect model are higher for all the statistics The powers of TML and TSB arequite comparable while the power of TB is much lower This is in contrast to theconclusion for power analysis based on the non-central chi-square where TB hasthe highest power in detecting a wrong model (Fouladi 2000 Yuan amp Bentler1997)

Next we focus on testing model (17) for close t We will only present results of thebootstrap using TML The use of TSB or TB is essentially the same MacCallum et al(1996) recommended testing close t (RMSEA 05) fair t (05 lt RMSEA 08)mediocre t (08 RMSEA 10) and poor t (10 lt RMSEA) We will conduct abootstrap procedure for these tests With (10) it is easy to see thatRMSEA = [minv D(S c S(v))( p plusmn q)]12 = e corresponds to dc = n ( p plusmn q)e2 Thed c s corresponding to e = 05 08 and 10 are given in the second row of Table 5The 95th percentiles of the corresponding x 2

24(dc ) are in the third row of the tableThese are respectively xed critical values for testing close t fair t and mediocre tin the approach of MacCallum et al (1996)

Bootstrap approach to inference and power analysis 103

Table 4 Bootstrap power estimates (ba ) and power estimates (ba) referring to x 24(d)

Sample x i Sample x (05)i Sample x (25)

i

Statistic Ha1 Ha2 Ha1 Ha2 Ha1 Ha2

TML ba 627 993 661 997 715 998ba 803 998 803 998 803 998

TSB ba 613 989 661 997 717 998ba 782 997 814 998 851 999

TB ba 374 983 454 991 501 995ba 743 999 739 1000 754 1000

Table 5 Test for close t

RMSEA 05 08 10dc 87 22272 3480

Sample TML x 224(dc 95)plusmn 1 48919 66926 82764

x i 51187 ca 53013 69928 84287p x 2 033 315 695pB 073 359 695

x (05)i 49408 ca 49548 55595 59620

p x 2 046 368 743pB 053 107 163

x (25)i 48883 ca 46305 51741 55899

p x 2 050 384 756pB 034 080 131

Let S x be the sample covariance matrix of x i and v be the corresponding maximumlikelihood estimate of v The values of h in the solution for Sc in the form ofS

h= (1 plusmn h)S(v) + hS x corresponding to e = 05 08 and 10 are respectively

0421 61 0669 81 and 0830 83 Applying the bootstrap procedure outlined in Section24 to the samples x i x (05)

i and x (25)i the estimates ca and p-values for each of the

samples are reported in Table 5 The results indicate that there are substantialdifferences between the traditional p-values px 2 and the bootstrap p-values pBespecially when RMSEA = 10 for the downweighted samples It is obvious that TMLwhen RMSEA = 10 has a much shorter right tail than that of x 2

24(dc) This fact is alsoreected in the corresponding ca s Table 5 once again illustrates the fact that given aclose t the behaviour of TML cannot be described by a non-central chi-square unlessthe corresponding TML under H0 in (1) has a heavy right tail As when testing for exactt using a chi-square table to judge the signicance of a statistic in testing model (17) forclose t is misleading

Example 2 Neumann (1994) presented an alcohol and psychological symptomdata set consisting of 10 variables and 335 cases The two variables in x = (x 1 x 2) cent arerespectively family history of psychopathology and family history of alcoholism whichare indicators for a latent construct of family history The eight variables iny = (y1 y8) cent are respectively the age of rst problem with alcohol age of rstdetoxication from alcohol alcohol severity score alcohol use inventory SCL-90psychological inventory the sum of the Minnesota Multiphasic Personality Inventoryscores the lowest level of psychosocial functioning during the past year and the highestlevel of psychosocial functioning during the past year With two indicators for eachlatent construct these eight variables respectively measure age of onset alcoholsymptoms psychopathology symptoms and global functioning Neumannrsquos (1994)theoretical model for this data set is

x = Lx y + d y = L y h + e (18a)

h = B h + Gy + z (18b)

where

Lx =10

l 1

Aacute Ly =

1 l 2 0 0 0 0 0 0

0 0 1 l 3 0 0 0 0

0 0 0 0 1 l 4 0 0

0 0 0 0 0 0 1 l 5

0

BBBBB

1

CCCCCA

cent

(18c)

B =

0 0 0 0b 21 0 0 0

b 31 b 32 0 00 b42 b43 0

0

BBB

1

CCCA G =

g11

0

00

0

BBB

1

CCCA F = Var(y) (18d )

and e d and z are vectors of errors whose elements are all uncorrelated The modeldegrees of freedom are 29

With Mardiarsquos multivariate kurtosis equal to 1476 the data set may come from adistribution with heavy tails Abootstrap procedure is more appropriate after the heavytails are properly downweighted Actually Yuan et al (2001) found that the samplex (10)

i leads to the most efcient parameter estimates in (18) among various proceduresHere we apply the three statistics to the two samples x i and x (10)

i Our purpose is to

104 Ke-Hai Yuan and Kentaro Hayashi

explore the pivotal property of TML TSB and TB on each sample After noticing that alarge portion of the QQ plot for TML applying to x (10)

i is below the x = y line ouranalysis also includes the sample x (05)

i The QQplots of the three statistics applied to x i indicate that none of them is nearly

pivotal TML and TSB are for the most part above the x = y line however both their lefttails are slightlybelow the x = y line This implies that some bootstrap samples t model(18) extremely well a phenomenon which can be caused by too many data points nearthe centre of the distribution Adownweighting procedure only affects data points thatcause a test statistic to possess a heavy right tail It is not clear to us how to deal with asample when a test statistic has a light left tail The QQplots of TML and TSB on x (05)

i andx (10)

i suggest that their heavier right tails on x i are under control However the right tailof TB is still quite heavy when compared to x 2

29 The downweighting transformationH(10) not only just controls the right tail of TML but also makes most of the QQplot fallbelow the x = y line Visually inspecting the QQ plots of TML on x i x (05)

i and x (10)i

suggests that a downweighting transformation by H(r) with 0 lt r lt 05 may lead toa better procedure for analysing the alcohol and psychological symptom data setbased on TML Since any procedure is only an approximation to the real world andTML is nearly pivotal when applying to x (05)

i we recommend the analysis using TMLon x (05)

i for this data set This leads to TML = 4243 with a bootstrap p-value of 040implying that the model (18) marginally ts the alcohol and psychological symptomdata set

Both the raw data sets in Examples 1 and 2 have signicant multivariate kurtoses andthe downweighting transformation (15) achieves approximately pivotal behaviour ofthe statistics We maywonder how these test statistics behave when applied to a data setthat does not have a signicant multivariate kurtosis This is demonstrated in thefollowing example

Example 3 Table 121 of Mardia Kent and Bibby (1979) contains test scores ofn = 88 students on p = 5 topics mechanics vectors algebra analysis and statisticsThe rst two topics were tested with closed-book exams and the last three withopen-book exams Since these two examination methods may tap different abilities atwo-factor model as in (17a) with

L =10 l 21 0 0 00 0 10 l 42 l 52

sup3 acutecent F =

f11 f12

f 21 f 22

sup3 acute(19)

was proposed and conrmed by Tanaka Watadani and Moon (1991) Mardiarsquos multi-variate kurtosis for this data set equal to 0057 is not signicant It would be interestingto see how TML TB and TSB perform on this data set

QQ plots for the three statistics applied to x i indicate that all of them have heavierright tails than that of x 2

4 The QQ plot for TML applied to x (r)i continues to exhibit a

heavier right tail until r reaches 30 TSB and TB still possess quite heavy right tails evenwhen r = 30

This data set has been used as an example for inuence analysis Previous studiesindicate that case number 81 is the most inuential point (Lee amp Wang 1996 Fung ampKwan 1995) So it is enticing to apply the three statistics to the x i s without the 81stcase The QQplots of TML TSB and TB applied to the remaining 87 cases indicate that allthe three statistics still have heavier right tails than that of x 2

4 Actually compared tothose based on all 88 cases the right tails of the three statistics on the 87 cases are evenheavier For this data set our recommended analysis is to use TML applied to x (30)

i With

Bootstrap approach to inference and power analysis 105

TML = 189 and a bootstrap p-value of 71 model (19) is more than good enough inexplaining the relationship of the ve variables

6 Non-convergence with Bootstrap ReplicationsNon-convergence issues exist with resampling and caution is needed for bootstrapinference (Ichikawa amp Konishi 1995) This generally happens when the sample size isnot large enough and especially when a model structure is wrong This issue wasdiscussed in Yuan and Bentler (1997) with Monte Carlo studies on a covariancestructure model Here we propose a reasonable way of handling non-convergencewith bootstrap replications

With an iterative algorithm like Newtonrsquos convergence is generally dened ask Dv ( j ) k lt e where e is a small number and Dv ( j ) = v ( j ) plusmn v ( j plusmn 1) with v ( j ) being thej th step solution Let s = vech(S) be the factor formed by stacking the non-duplicatedelements of S and let its sample counterpart be s = vech(S) Then Dv ( j + 1) =

( Ccedils centj Wj Ccedilsj)plusmn 1 Ccedils centj Wj (s plusmn sj) where Ccedilsj = [ds(v)dv | v ( j )] sj = s(v ( j )) and Wj is the

corresponding weight matrix evaluated at v ( j ) So Dv ( j ) is proportional toS plusmn S(v ( j plusmn 1)) and it is impossible for k Dv ( j ) k to be smaller than e if S(v) is far fromE(S) Although a model is correct in a bootstrap population F0 some bootstrapreplications may still lie far away from the model especially when sampling errors arelarge for small samples For these samples even if we can obtain a solution using somenon-iterative or direct search method the corresponding statistic will be signicantBased on this fact we should distinguish two kinds of non-convergence in bootstrap orgeneral Monte Carlo studies The rst is where a sample contains enough distinct pointsand still cannot reach convergence with a model which should be treated as asignicant replication or a lsquobad modelrsquo The second is where a sample does not containenough distinct observations to t a model For obtaining a TML this number is p + 1 thenumber is p + 1 for obtaining a TB Although a TSB can be obtained once a TML isavailable we need to have a positive denite sample covariance matrix S y ofy i = vech[(x plusmn x )(x plusmn x ) cent ] in order for TSB to make sense (Bentler amp Yuan 1999)and the minimum number of distinct data points for S y to be positive denite is p + 1We should treat the second case as a bad sample and ignore it in bootstrap replications

In practice instead of estimating ca one generally reports the p-value of a statistic TIf all B bootstrap replications result in convergent solutions

pB =B0 + 1 plusmn M

B0 + 1

where M = fTb | Tb gt Tg When non-convergences occur B0 should be dened as thenumber of converged samples plus the number of signicant samples due to a badmodel M should be dened as the number of non-converged samples due to the badmodel plus the number of converged samples that result in Tb gt T With this modi-cation there is no problem with hypothesis testing A similar modication applies toformula (8) for power evaluation With ca = T(B0 (1 plusmn a)) in determining the numeratorin (8) one needs at least B0(1 plusmn a) converged samples when sampling from F0 The proposed procedure fails to generate a power estimate when the number ofconvergences under H0 is below B0(1 plusmn a)

Let l 1(S x ) $ l 2(S x ) $ $ l p(S x ) be the eigenvalues of the sample covariancematrix S x and g1(S y ) $ g 2(S y ) $ $ gp (S y ) be the eigenvalues of the sample

106 Ke-Hai Yuan and Kentaro Hayashi

covariance matrix S y of y i = vech[(x plusmn x )(x plusmn x ) cent ] In computing the three examplesour criteria for bad samples are l p(S x )l 1(S x ) 10plusmn 20 for obtaining TML andgp (S y )g1(S y ) 10plusmn 20 for obtaining TSB and TB If these criteria are satised and themodel still cannot reach convergence (k Dv k lt 10plusmn 5) in 100 iterations we treat thecorresponding statistic as innity All non-convergences with the three examples weredue to signicant replications these are reported in Table 6 Table 6(a) implies that for agiven sample the further a model is from H0 the more often non-convergences occurThe table also suggests that with a given model the heavier the tails of a data set themore often one obtains non-convergences

7 Discussion and conclusionsExisting conclusions regarding the three commonly used statistics TML TSB and TB arebased on asymptotics and Monte Carlo studies The properties inherent in either ofthese approaches may not be enjoyed by these statistics in practice because of either anite sample or an unknown sampling distribution By applying them to a specic data

Bootstrap approach to inference and power analysis 107

Table 6 Non-convergences due to signicant samples

(a) Example 1

Sample x i Sample x (05)i Sample x (25)

i

Statistic H0 Ha1 Ha2 H0 Ha1 Ha2 H0 Ha1 Ha2

TML 0 0 2 0 0 1 0 0 1TSB 0 0 2 0 0 1 0 0 1TB 0 10 21 0 10 27 0 7 20

(b) Example 2

Sample x i Sample x (05)i Sample x (10)

iStatistic H0 H0 H0

TML 9 1 1TSB 9 1 1TB 1 0 0

(c) Example 3

Sample x i Sample x (30)i Sample x i

(81st case removed)Statistic H0 H0 H0

TML 1 0 0TSB 1 0 0TB 0 0 0

set through resampling properties of the three statistics can be visually examined bymeans of QQplots When a data set possesses heavy tails TML will inherit these tails byhaving a heavy right tail Actually all three statistics need the sampling distribution tohave nite fourth-order moments With possible violation of this assumption bypractical data we propose applying a bootstrap procedure to a transformed samplex (r)

i through downweighting The combination of bootstrapping and downweightingnot only offers a theoretical justication for applying the bootstrap to a data set withheavy tails but also provides a quite exible tool for exploring the properties of each ofthe three statistics Even if inference is based on referring a statistic to a chi-squaredistribution one will obtain a more accurate model evaluation by applying TML to atransformed sample x (r)

i Asymptotics justify the three statistics from different perspectives and TB is

asymptotically distribution-free Previous Monte Carlo studies mainly support TSBHowever with proper downweighting TML is generally the one that is best describedby a chi-square distribution However the conclusion that TML is the best statistic forbootstrap inference has to be preliminary Future studies may nd TB or TSB moreappropriate for other data sets We recommend exploring the different procedures for agiven data set as illustrated in Section 5

AcknowledgementsThe authors would like to thank Professor Peter M Bentler and two anonymous referees whoseconstructive comments lead to an improved version of this paper Correspondence concerningthis article should be addressed to Ke-Hai Yuan (kyuanndedu)

ReferencesAmemiya Y amp Anderson T W (1990) Asymptotic chi-square tests for a large class of factor

analysis models Annals of Statistics 18 1453ndash1463Barndorff-Nielsen O E amp Cox D R (1984) Bartlett adjustments to the likelihood ratio statistic

and the distribution of the maximum likelihood estimator Journal of the Royal StatisticalSociety B 46 483ndash495

Bentler P M (1995) EQS Structural equations program manual Encino CA MultivariateSoftware

Bentler P M amp Yuan K-H (1999) Structural equation modeling with small samples Teststatistics Multivariate Behavioral Research 34 181ndash197

Beran R (1986) Simulated power functions Annals of Statistics 14 151ndash173Beran R (1988) Prepivoting test statistics Abootstrap view of asymptotic renements Journal

of the American Statistical Association 83 687ndash697Beran R amp Srivastava M S (1985) Bootstrap tests and condence regions for functions of a

covariance matrix Annals of Statistics 13 95ndash115Bollen K A amp Stine R (1993) Bootstrapping goodness of t measures in structural equation

models In K A BOllen and J S Long (Eds) Testing structural equation models(pp 111ndash135) Newbury Park CA Sage

Browne M W (1984) Asymptotic distribution-free methods for the analysis of covariancestructures British Journal of Mathematical and Statistical Psychology 37 62ndash83

Browne M W amp Shapiro A (1988) Robustness of normal theory methods in the analysis oflinear latent variate models British Journal of Mathematical and Statistical Psychology 41193ndash208

108 Ke-Hai Yuan and Kentaro Hayashi

Cohen J (1988) Statistical power analysis for the behavioral sciences (2nd ed) Hillsdale NJErlbaum

Davison A C amp Hinkley D V (1997) Bootstrap methods and their application CambridgeCambridge University Press

Efron B ampTibshirani R J (1993) An introduction to the bootstrap New York Chapman ampHallFouladi R T (1998) Covariance structure analysis techniques under conditions of multi-

variate normality and nonnormality Modied and bootstrap based test statistics Paperpresented at the annual meeting of the American Educational Research Association San DiegoCA

Fouladi R T (2000) Performance of modied test statistics in covariance and correlationstructure analysis under conditions of multivariate nonnormality Structural EquationModeling 7 356ndash410

Fung W K ampKwan C W (1995) Sensitivityanalysis in factor analysis Difference between usingcovariance and correlation matrices Psychometrika 60 607ndash614

Hall P (1992) The bootstrap and Edgeworth expansion New York Springer-VerlagHolzinger K J amp Swineford F (1939) A study in factor analysis The stability of a bi-factor

solution University of Chicago Press Supplementary Educational Monographs No 48Hu L T Bentler P M amp Kano Y (1992) Can test statistics in covariance structure analysis be

trusted Psychological Bulletin 112 351ndash362Huber P J (1981) Robust statistics New York WileyIchikawa M amp Konishi S (1995) Application of the bootstrap methods in factor analysis

Psychometrika 60 77ndash93Lawley D N amp Maxwell A E (1971) Factor analysis as a statistical method (2nd ed) New

York American ElsevierLee S Y ampWang S J (1996) Sensitivityanalysis of structural equation models Psychometrika

61 93ndash108MacCallum R C Browne M W ampSugawara H M (1996) Power analysis and determination of

sample size for covariance structure modeling Psychological Methods 1 130ndash149Mardia K V (1970) Measure of multivariate skewness and kurtosis with applications

Biometrika 57 519ndash530Mardia K V Kent J T amp Bibby J M (1979) Multivariate analysis New York Academic PressMaronna R A (1976) Robust M-estimators of multivariate location and scatter Annals of

Statistics 4 51ndash67Mooijaart A ampBentler P M (1991) Robustness of normal theory statistics in structural equation

models Statistica Neerlandica 45 159ndash171Neumann C S (1994) Structural equation modeling of symptoms of alcoholism and

psychopathology Doctoral dissertation University of Kansas LawrenceSatorra A amp Bentler P M (1988) Scaling corrections for chi-square statistics in covariance

structure analysis In 1988 Proceedings of Business and Economics Sections (pp 308ndash313)Alexandria VA American Statistical Association

Satorra A amp Bentler P M (1990) Model conditions for asymptotic robustness in the analysis oflinear relations Computational Statistics amp Data Analysis 10 235ndash249

Satorra A ampSaris W E (1985) Power of the likelihood ratio test in covariance structure analysisPsychometrika 50 83ndash90

Satorra A Saris W E amp de Pijper W M (1991) Acomparison of several approximations to thepower function of the likelihood ratio test in covariance structure analysis StatisticaNeerlandica 45 173ndash185

Shapiro A (1987) Robustness properties of the MDF analysis of moment structures SouthAfrican Statistical Journal 21 39ndash62

Steiger J H amp Lind J M (1980) Statistically based tests for the number of common factorsPaper presented at the annual meeting of the Psychometric Society Iowa City

Steiger J H Shapiro A amp Browne M W (1985) On the multivariate asymptotic distribution ofsequential chi-square statistics Psychometrika 50 253ndash264

Bootstrap approach to inference and power analysis 109

Tanaka Y Watadani S amp Moon S H (1991) Inuence in covariance structure analysis with anapplication to conrmatory factor analysis Communications in Statistics Theory andMethods 20 3805ndash3821

Tyler D E (1983) Robustness and efciency properties of scatter matrices Biometrika 70411ndash420

Wakaki H Eguchi S amp Fujikoshi Y (1990) A class of tests for a general covariance structureJournal of Multivariate Analysis 32 313ndash325

Yuan K-H amp Bentler P M (1997) Mean and covariance structure analysis Theoretical andpractical improvements Journal of the American Statistical Association 92 767ndash774

Yuan K-H amp Bentler P M (1998) Structural equation modeling with robust covariancesIn A E Raftery (ed) Sociological methodology 1998 (pp 363ndash396) Boston BlackwellPublishers

Yuan K-H amp Bentler P M (1999) On normal theory and associated test statistics in covariancestructure analysis under two classes of nonnormal distributions Statistica Sinica 9 831ndash853

Yuan K-H Chan W amp Bentler P M (2000) Robust transformation with applications tostructural equation modelling British Journal of Mathematical and Statistical Psychology53 31ndash50

Yuan K-H Bentler P M amp Chan W (2001) Structural equation modeling with heavy taileddistributions Manuscript submitted for publication

Yuan K-H amp Hayashi K (2001) On using an empirical Bayes covariance matrix in bootstrapapproach to covariance structure analysis Manuscript submitted for publication

Yung Y F amp Bentler P M (1994) Bootstrap-corrected ADF test statistics in covariance structureanalysis British Journal of Mathematical and Statistical Psychology 47 63ndash84

Yung Y F amp Bentler P M (1996) Bootstrapping techniques in analysis of mean and covariancestructures In G A Marcoulides and R E Schumacker (Eds) Advanced structural equationmodeling Techniques and issues (pp 195ndash226) Hillsdale NJ Lawrence Erlbaum

Zhang J amp Boos D D (1992) Bootstrap critical values for testing homogeneity of covariancematrices Journal of the American Statistical Association 87 425ndash429

Received 9 November 2000 revised version received 22 March 2002

110 Ke-Hai Yuan and Kentaro Hayashi

Page 5: Bootstrap approach to inference and power analysis based on three test statistics for covariance structure models

power can be found in Beran (1986) Efron and Tibshirani (1993 Chapter 25) andDavison and Hinkley (1997 Section 46)

24 Test for close tA covariance structure model is at best only an approximation to the real world Anyinteresting model will be rejected when the sample size is large enough Based on theroot-mean-square error of approximation (RMSEA) (Steiger amp Lind 1980) MacCallumet al (1996) proposed testing for close t rather than exact t in (1) Using a non-centralchi-square to describe the behaviour of TML their approach is equivalent to testing if theNCP is less than a prespecied value The test for close t can also be easilyimplemented with the bootstrap Let S c be a covariance matrix with close t such that

dc = n minv

D(Sc S(v)) (10)

where dc is the lsquonon-centrality parameterrsquo in equation (8) of MacCallum et al (1996)Such a covariance matrix can be found in the form of Sh = hS x + (1 plusmn h)S(v) WhenD = DML is the Wishart likelihood discrepancy function Yuan and Hayashi(2001) showed that minv D(Sh S(v)) = D(Sh S(v)) is a strictly increasing function ofh [ [0 1] and it is straightforward to nd a S c once a dc is given Let y i = S 12

c Splusmn 12x x i

i = 1 n then the covariance matrix of y Fy is S c Let Tb b = 1 B0 be thecorresponding statistics of T evaluated respectivelyat B0 independent samples from Fy Then ca = T(B0 (1 plusmn a)) is the critical value estimate of the test for close t and the modelwill be rejected when T = T(x 1 x n) gt ca It is also straightforward to performpower analysis with the bootstrap when a less-close-t covariance matrix describes thepopulation

3 Pivotal property of TML TSB and TB

Although it is unclear which of the statistics TML TSB and TB is best for bootstrappingwith a specic data set there is a general result for choosing a statistic (Beran 1988Hall 1992) In the following we will discuss this result and its relevance to CSA

A statistic is pivotal if its distribution does not depend on any parameters in theunderlying sampling distribution and it is asymptotically pivotal if its asymptoticdistribution does not depend on unknown parameters Let a statistic T = f (S x ) be asmooth function of the sample covariance matrix S x of x 1 x n Suppose T isasymptotically pivotal and an Edgeworth expansion applies to its distribution functionunder H0 (Barndorff-Nielsen amp Cox 1984 Wakaki Eguchi amp Fujikoshi 1990)

P(T t | F0) = G(t) + n plusmn 1g(t) + O(nplusmn 32) (11a)

where G(t) is the asymptotic distribution function of T and g(t ) is a smooth functionthat depends on some unknown population parameters As discussed in Hall (1992) andDavision and Hinkley (1997 Section 261) (11a) generally holds for smooth functionsof sample moments Let T be the corresponding statistic based on resampling from thecorresponding F0 Its Edgeworth expansion is

P(T lt t | F0) = G(t) + n plusmn 1 g(t ) + Op(nplusmn 32) (11b)

Since g(t) = g(t ) + Op(nplusmn 12) in general

P(T t | F0) plusmn P(T t | F0) = Op(nplusmn 32) (12)

Bootstrap approach to inference and power analysis 97

Suppose we know G then asymptotic inference is based on comparing the observedstatistic T = T(x 1 x n) to the critical value Gplusmn 1(1 plusmn a) Equation (11a) implies thatasymptotic inference can achieve a designated level a to the order of accuracy ofOp(1n) According to (12) bootstrap inference based on an asymptotically pivotalstatistic can achieve the level a to a higher order of accuracy When T is notasymptotically pivotal G(t) in (11) depends on some unknown parameters One hasto estimate G(t) by G(t) for asymptotic inference Similarly with the bootstrap one hasto implicitlyestimate the unknown parameters in G(t ) and consequently it achieves thesame order of accuracy to the level a as that based on G(t) So for more accurateinference we should choose a statistic that is as nearly pivotal as possible Beran (1988)gave a nice discussion on using a pivotal statistic for bootstrap and the order of accuracyfor Type I errors

When data are normallydistributed (TML| H0) is asymptoticallydistributed as x 2p plusmn q

where p = p( p + 1)2 and q is the number of unknown parameters in S(v) Conse-quently TML is asymptotically pivotal (Davison amp Hinkley 1997 p 139) More generalconditions also exist for TML to be asymptotically pivotal with some specic models(Amemiya amp Anderson 1990 Browne amp Shapiro 1988 Mooijaart amp Bentler 1991Satorra ampBentler 1990 Shapiro 1987 Yuan ampBentler 1999) Unfortunately there is noeffective way of verifying these conditions in practice The statistic TSB is obtained byrescaling TML using fourth-order sample moments (Satorra amp Bentler 1988) Within theclass of elliptical or pseudo-elliptical distributions with nite fourth-order moments(TSB | H0) approaches x 2

p plusmn q (Yuan amp Bentler 1999) So TSB is asymptotically pivotalwhen the sampling distribution is elliptical or pseudo-elliptical It is not asymptoti-cally pivotal for other types of non-normal distribution In contrast to TML and TSB TBis asymptotically pivotal for any sampling distribution with nite fourth-ordermoments

The above discussion may imply that TB is the preferred statistic for bootstrapinference in CSA However there exist conditions that may hide the pivotal property ofTB with nite samples In addition to the asymptotically pivotal property Davison andHinkley (1997 Section 251) discussed nearly pivotal properties of a statistic T whichrequires the distribution of T to approximately follow the same distribution whensampling from Fx or Fx Since the target function G(t) for the three test statistics is thecdf of x 2

p plusmn q the better a statistic is approximated by a chi-square distribution the moreaccurate its bootstrap inference is in controlling Type I errors Because sample size is aserious issue with CSA a statistic that is asymptoticallypivotal may not be nearlypivotalas will be shown in Section 5

4 Heavy tails with practical dataBootstrap inference based on TML TB or TSB requires Fx to have nite fourth-ordermoments Of course the sample fourth-order moments of any sample will always benite However if some of the fourth-order moments (eg kurtoses) are quite large thecorresponding population fourth-order moments may actuallybe innite Even when allthe population fourth-order moments are nite inference based on the bootstrap willnot be accurate when Fx possesses heavy tails Here we propose applying the bootstrapprocedure to a sample in which cases contributing to heavy tails are properly down-weighted Following the study in Yuan Chan and Bentler (2000) we only use theHuber-type weights (Huber 1981 Tyler 1983) in controlling those cases

98 Ke-Hai Yuan and Kentaro Hayashi

For a p-variate sample x 1 x n let

d i = d (x i m S) = [(x i plusmn m ) cent Splusmn 1(x i plusmn m )]12

be the Mahalanobis distance and u 1(t ) and u 2(t) be non-negative scalar functions Arobust M-estimator ( ˆm S) can be obtained by solving (Maronna 1976)

m =Xn

i = 1

u 1(di )x i

iquest Xn

i = 1

u1(di ) (13a)

S =Xn

i = 1

u 2(d 2i )(x i plusmn m )(x i plusmn m ) cent n (13b)

With decreasing functions u 1(t) and u 2(t) cases having greater di s receive smallerweights and thus their effects are controlled Let r represent the proportion ofoutlying cases one wants to control and r be a constant dened by P(x 2

p gt r 2) = rThe Huber-type weights corresponding to this r are

u 1(d ) =1 if d r

rd if d gt r

raquo(14)

and u 2(d 2) = fu 1(d )g 2J where J is a constant such that Efx 2p u 2(x 2

p)g = p whichmakes the estimate S unbiased if Fx = Np( m S) We may regard r as a tuning parameterwhich can be adjusted according to the tails of a specic data set Applying differenttypes of weights to several data sets Yuan Bentler and Chan (2001) found that the mostefcient parameter estimates often go with Huber-type weights

Let ( ˆm S) be the converged solution to (13) u2 i = u 2fd 2(x i ˆm S)g and

x (r)i = f

u 2 i

p(x i plusmn ˆm )g (15)

Then we can rewrite (13b) as

S =Xn

i = 1

x (r)i x (r)

i cent n

which is just the sample covariance matrix of the x (r)i Yuan et al (2000) proposed using

(15) as a downweighting transformation formula Working with several practical datasets they found that the transformed sample x (r)

i is much better approximated by anormal distribution than the original sample x i As we shall see when applying thebootstrap to a sample x (r)

i the test statistic TML is approximately pivotal For x (r)i we

have

E(x (r)i 1 x (r)

i 2 x (r)i 3 x (r)

i 4 ) = E[u 22(d 2

i )(x i 1 plusmn m1)(x i 2 plusmn m2)(x i 3 plusmn m 3)(x i 4 plusmn m4)] + o(1) (16)

where x (r)i j is the j th element of x (r)

i Because the denominator of u 2(d 2i ) in (14)

contains d 2i when d 2

i gt r 2 (16) is bounded Even when the fourth-order moments of Fx

do not exist the corresponding fourth-order moments of the x (r)i are still nite

When interest lies in the structure of the population covariance matrix S = Cov(x i)one needs to know whether transformation (15) changes the structure of S Let S (r) bethe population covariance matrix of x (r)

i then S (r) = kS when the sampling distribu-tion is elliptically symmetric For almost all the commonly used models in the socialsciences modelling S is equivalent to modelling S (r) (Browne 1984) Yuan et al (2000)further illustrated that outlying cases create signicant multivariate skewness and thatthe sample covariance matrix S x is biased in such a situation Analysis based on x (r)

i can

Bootstrap approach to inference and power analysis 99

successfully remove the bias in S x and recover the correct model structure More detailson the merits of robust approaches to CSAcan be found in Yuan and Bentler (1998) andYuan et al (2000)

5 ApplicationsOur purpose in this section is to compare the performance of TML TB and TSB using realdata We will show that by combining the bootstrap and a downweighting transforma-tion a nearly optimal procedure may be found for analysing a given data set If a teststatistic T is nearly pivotal then its evaluations Tb at the B0 bootstrap replicationsshould approximately follow a chi-square distribution There are a variety of tools toevaluate this property We favour the quantilendashquantile (QQ) plot because of itsvisualization value If T follows x 2

d the plot of ordered Tb s against the quantiles ofx 2

d should form an approximately straight line with slope2

p2 Asignicant departure

from this line indicates violations of the pivotal property To save space the QQplots arepresented in a document available on the Internet (httpwwwndedu kyuanpapersbsinfpowgpdf ) We choose B0 = B1 = B = 1000 in estimating ca and ba although a smaller B may be enough in obtaining these estimates (Efron amp Tibshirani1993)

Example 1 Holzinger and Swineford (1939) provide a data set consisting of 24cognitive variables on 145 subjects We use nine of the variables in our study visualperception cubes and lozenges measuring a spatial ability factor paragraph compre-hension sentence completion and word meaning measuring a verbal ability factoraddition counting dots and straight-curved capitals measuring (via the imposition of atime limit) a speed factor Let x represent the nine variables then the conrmatoryfactor model

x = L f + e Cov(x ) = LFL cent + W (17a)

with

L =

10 l 21 l 31 0 0 0 0 0 0

0 0 0 10 l 52 l 62 0 0 0

0 0 0 0 0 0 10 l 83 l 93

0

BB

1

CCA

cent

F =

f11 f12 f13

f 21 f22 f 23

f 31 f32 f 33

0

BB

1

CCA

(17b)

represents Holzinger and Swinefordrsquos hypothesis We assume that the measurementerrors are uncorrelated with W = Cov(e ) being a diagonal matrix There are q = 21unknown parameters and the model degrees of freedom are 24

Mardiarsquos (1970) multivariate kurtosis for the nine-variable model is 304 implyingthat the data may come from a distribution with heavier tails than those of a normaldistribution We therefore apply the downweighting transformation (15) in order forTML to be approximately pivotal Based on previous research (Yuan et al 20002001) we apply the three statistics to three samples the raw sample x i and thetransformed samples x (05)

i and x (25)i The following bootstrap procedures include

further transforming these samples to satisfy H0 or H1 as given in Section 2

100 Ke-Hai Yuan and Kentaro Hayashi

QQ plots for the three statistics on the raw data set suggest that no statistic isapproximately pivotal All have heavier right tails than that of x 2

24 This is expected forTML because the sample x i exhibits heavier tails than those of a normal distributionPrevious studies suggest that a large sample size is needed for TB to behave like achi-square random variable Asample size of n = 145 may not be large enough or theremay be other unknown factors that prevent TB from behaving like a chi-square variableEven though many simulation studies recommend TSB this statistic certainly has aheavier right tail than that of x 2

24 for this data setWhen applied to the transformed sample x (05)

i all the three statistics still haveheavier right tails than that of x 2

24 When applied to x (25)i the QQplots indicate that TML

is approximately described by x 224 but TB or TSB are not Consequently the suggested

procedure for this data set is to apply TML to the transformed sample x (25)i Boot-

strapping with TML on this sample not only leads to more efcient parameter estimates(Yuan et al 2001) but also provides a more accurate Type I error estimate

Table 1 gives the signicance levels of model (17) evaluated by various proceduresSeveral differences exist among these procedures First all the bootstrap p-values ( pB)

are greater than those ( px 2 ) obtained by referring the test statistics to x 224 The only

comparable pair of pB and px 2 is when TML is applied to the sample x (25)i Second the TB

statistic for each sample is the largest among the three statistics but the pB for TB is alsothe largest for any of the three samples This is in contrast to conclusions regarding theperformance of TB when referring to a chi-square distribution (Hu et al 1992 Fouladi2000 Yuan amp Bentler 1997) Third all the pBs based on any statistic with any of thesamples are quite comparable implying the robustness of bootstrap inference Theabove phenomena can be further explained by the critical values c 05 in Table 2 All thec 05 s are greater than x 2

24(95)plusmn 1 = 36415 the 95th percentile of x 224 When the heavier

tails are downweighted in the sample the T s corresponding to each of the threestatistics behave more like x 2

24 and the corresponding c 05 is nearer 36415 Although TBis the largest among the three statistics on anyof the samples the corresponding c 05 forTB is also the largest explaining whythe associated pBs for TB can also be the largest Onthe other hand the traditional inference procedure is to refer TB to x 2

24 for signicanceWith such a xed reference distribution that does not take the increased variabilityof TBinto account a larger TB generally corresponds to a smaller px 2

The power properties of TML TB and TSB are evaluated at two alternatives Startingwith model (17) the parameter l 91 is the only signicant path identied by theLagrange multiplier test in EQS (Bentler 1995) Adding this extra parameter to (17b)the rst alternative Sa1 is the estimated covariance matrix obtained by tting this model

Bootstrap approach to inference and power analysis 101

Table 1 Statistics bootstrap p-values ( pB) and p-values ( px 2 ) referring to x 224

Sample x i Sample x (05)i Sample x (25)

i

Statistic pB px 2 pB p x 2 pB p x 2

TML 51187 017 001 49408 010 002 48883 004 002TSB 48696 020 002 50318 013 001 53674 003 000TB 56726 055 000 62095 024 000 66910 014 000

to the raw sample x i by minimizing DML The second alternative Sa 2 is the samplecovariance matrix of the raw sample With these two alternatives the NCPs in (9) for thethree statistics are given in Table 3 Notice that once a Sa is given the d in (9)corresponding to TML is xed and does not depend on the sample Because both TSB andTB involve fourth-order sample moments their corresponding ds change as the tails ofthe data set change QQplots for the ordered T s against quantiles of the x 2

24(d) indicatethat the distributions of TML and TSB applied to x i are well approximated by thecorresponding non-central chi-squares Because TML under H0 applied to x (25)

i is wellapproximated by x 2

24 we would expect its distributions under the two alternatives alsoto be well approximated by x 2

24(2260) and x 224(5119) respectively This expectation is

not fullled as judged from the corresponding QQplots The reason for this is that evenfor perfectly normal data simulated from Np( m S) the statistic TML will not behave like anon-central chi-square unless d is quite small and n is large Because the NCP tends to beoverestimated when it is large (Satorra Saris amp de Pijper 1991) the corresponding QQplots are below the x = y line

Can we trust the results of power analysis in the traditional approach when TML andTSB are well approximated by non-central chi-square distributions in this example Theanswer is no This is because the critical value x 2

24(95)plusmn 1 = 36415 which does not takethe actual variabilityof TMLor TSB into account is not a good estimate of ca in (3) As canbe seen from Table 2 there exist quite substantial differences between x 2

24(95)plusmn 1 andthe c05 s Actually power analysis based on a non-central chi-square table is not reliablefor even perfectly normal data when S(v) is not near S a Table 4 contrasts the powerestimates based on the bootstrap (ba ) with those based on the traditional approach (ba)

using NCPs in Table 3 The ba s are uniformly smaller than the ba s especially when theNCPs are not huge When an NCP is large enough there is almost no overlap betweenthe distribution of (T | H0) and that of (T | H1) Any sensible inference procedure can tellthe difference between the two and consequently the power estimates under Ha2 inTable 4 are approximately the same

102 Ke-Hai Yuan and Kentaro Hayashi

Table 2 Bootstrap critical values c 05

Statistic Sample x i Sample x (05)i Sample x (25)

i

TML 42348 39991 37022TSB 41694 41067 40587TB 57930 55120 53925

Table 3 Non-centrality parameter estimates associated with each statistic and sample

Sample x i Sample x (05)i Sample x (25)

i

Statistic Ha1 Ha2 Ha1 Ha2 Ha1 Ha2

TML 22603 51187 22603 51187 22603 51187TSB 21763 48695 23107 51901 24875 55948TB 20289 56726 20130 63187 20683 69516

Table 4 illustrates the power properties of each statistic when the data changeWhen the outlying cases are downweighted not only do the parameter estimatesbecome more efcient (Yuan et al 2001) but also the powers for identifying anincorrect model are higher for all the statistics The powers of TML and TSB arequite comparable while the power of TB is much lower This is in contrast to theconclusion for power analysis based on the non-central chi-square where TB hasthe highest power in detecting a wrong model (Fouladi 2000 Yuan amp Bentler1997)

Next we focus on testing model (17) for close t We will only present results of thebootstrap using TML The use of TSB or TB is essentially the same MacCallum et al(1996) recommended testing close t (RMSEA 05) fair t (05 lt RMSEA 08)mediocre t (08 RMSEA 10) and poor t (10 lt RMSEA) We will conduct abootstrap procedure for these tests With (10) it is easy to see thatRMSEA = [minv D(S c S(v))( p plusmn q)]12 = e corresponds to dc = n ( p plusmn q)e2 Thed c s corresponding to e = 05 08 and 10 are given in the second row of Table 5The 95th percentiles of the corresponding x 2

24(dc ) are in the third row of the tableThese are respectively xed critical values for testing close t fair t and mediocre tin the approach of MacCallum et al (1996)

Bootstrap approach to inference and power analysis 103

Table 4 Bootstrap power estimates (ba ) and power estimates (ba) referring to x 24(d)

Sample x i Sample x (05)i Sample x (25)

i

Statistic Ha1 Ha2 Ha1 Ha2 Ha1 Ha2

TML ba 627 993 661 997 715 998ba 803 998 803 998 803 998

TSB ba 613 989 661 997 717 998ba 782 997 814 998 851 999

TB ba 374 983 454 991 501 995ba 743 999 739 1000 754 1000

Table 5 Test for close t

RMSEA 05 08 10dc 87 22272 3480

Sample TML x 224(dc 95)plusmn 1 48919 66926 82764

x i 51187 ca 53013 69928 84287p x 2 033 315 695pB 073 359 695

x (05)i 49408 ca 49548 55595 59620

p x 2 046 368 743pB 053 107 163

x (25)i 48883 ca 46305 51741 55899

p x 2 050 384 756pB 034 080 131

Let S x be the sample covariance matrix of x i and v be the corresponding maximumlikelihood estimate of v The values of h in the solution for Sc in the form ofS

h= (1 plusmn h)S(v) + hS x corresponding to e = 05 08 and 10 are respectively

0421 61 0669 81 and 0830 83 Applying the bootstrap procedure outlined in Section24 to the samples x i x (05)

i and x (25)i the estimates ca and p-values for each of the

samples are reported in Table 5 The results indicate that there are substantialdifferences between the traditional p-values px 2 and the bootstrap p-values pBespecially when RMSEA = 10 for the downweighted samples It is obvious that TMLwhen RMSEA = 10 has a much shorter right tail than that of x 2

24(dc) This fact is alsoreected in the corresponding ca s Table 5 once again illustrates the fact that given aclose t the behaviour of TML cannot be described by a non-central chi-square unlessthe corresponding TML under H0 in (1) has a heavy right tail As when testing for exactt using a chi-square table to judge the signicance of a statistic in testing model (17) forclose t is misleading

Example 2 Neumann (1994) presented an alcohol and psychological symptomdata set consisting of 10 variables and 335 cases The two variables in x = (x 1 x 2) cent arerespectively family history of psychopathology and family history of alcoholism whichare indicators for a latent construct of family history The eight variables iny = (y1 y8) cent are respectively the age of rst problem with alcohol age of rstdetoxication from alcohol alcohol severity score alcohol use inventory SCL-90psychological inventory the sum of the Minnesota Multiphasic Personality Inventoryscores the lowest level of psychosocial functioning during the past year and the highestlevel of psychosocial functioning during the past year With two indicators for eachlatent construct these eight variables respectively measure age of onset alcoholsymptoms psychopathology symptoms and global functioning Neumannrsquos (1994)theoretical model for this data set is

x = Lx y + d y = L y h + e (18a)

h = B h + Gy + z (18b)

where

Lx =10

l 1

Aacute Ly =

1 l 2 0 0 0 0 0 0

0 0 1 l 3 0 0 0 0

0 0 0 0 1 l 4 0 0

0 0 0 0 0 0 1 l 5

0

BBBBB

1

CCCCCA

cent

(18c)

B =

0 0 0 0b 21 0 0 0

b 31 b 32 0 00 b42 b43 0

0

BBB

1

CCCA G =

g11

0

00

0

BBB

1

CCCA F = Var(y) (18d )

and e d and z are vectors of errors whose elements are all uncorrelated The modeldegrees of freedom are 29

With Mardiarsquos multivariate kurtosis equal to 1476 the data set may come from adistribution with heavy tails Abootstrap procedure is more appropriate after the heavytails are properly downweighted Actually Yuan et al (2001) found that the samplex (10)

i leads to the most efcient parameter estimates in (18) among various proceduresHere we apply the three statistics to the two samples x i and x (10)

i Our purpose is to

104 Ke-Hai Yuan and Kentaro Hayashi

explore the pivotal property of TML TSB and TB on each sample After noticing that alarge portion of the QQ plot for TML applying to x (10)

i is below the x = y line ouranalysis also includes the sample x (05)

i The QQplots of the three statistics applied to x i indicate that none of them is nearly

pivotal TML and TSB are for the most part above the x = y line however both their lefttails are slightlybelow the x = y line This implies that some bootstrap samples t model(18) extremely well a phenomenon which can be caused by too many data points nearthe centre of the distribution Adownweighting procedure only affects data points thatcause a test statistic to possess a heavy right tail It is not clear to us how to deal with asample when a test statistic has a light left tail The QQplots of TML and TSB on x (05)

i andx (10)

i suggest that their heavier right tails on x i are under control However the right tailof TB is still quite heavy when compared to x 2

29 The downweighting transformationH(10) not only just controls the right tail of TML but also makes most of the QQplot fallbelow the x = y line Visually inspecting the QQ plots of TML on x i x (05)

i and x (10)i

suggests that a downweighting transformation by H(r) with 0 lt r lt 05 may lead toa better procedure for analysing the alcohol and psychological symptom data setbased on TML Since any procedure is only an approximation to the real world andTML is nearly pivotal when applying to x (05)

i we recommend the analysis using TMLon x (05)

i for this data set This leads to TML = 4243 with a bootstrap p-value of 040implying that the model (18) marginally ts the alcohol and psychological symptomdata set

Both the raw data sets in Examples 1 and 2 have signicant multivariate kurtoses andthe downweighting transformation (15) achieves approximately pivotal behaviour ofthe statistics We maywonder how these test statistics behave when applied to a data setthat does not have a signicant multivariate kurtosis This is demonstrated in thefollowing example

Example 3 Table 121 of Mardia Kent and Bibby (1979) contains test scores ofn = 88 students on p = 5 topics mechanics vectors algebra analysis and statisticsThe rst two topics were tested with closed-book exams and the last three withopen-book exams Since these two examination methods may tap different abilities atwo-factor model as in (17a) with

L =10 l 21 0 0 00 0 10 l 42 l 52

sup3 acutecent F =

f11 f12

f 21 f 22

sup3 acute(19)

was proposed and conrmed by Tanaka Watadani and Moon (1991) Mardiarsquos multi-variate kurtosis for this data set equal to 0057 is not signicant It would be interestingto see how TML TB and TSB perform on this data set

QQ plots for the three statistics applied to x i indicate that all of them have heavierright tails than that of x 2

4 The QQ plot for TML applied to x (r)i continues to exhibit a

heavier right tail until r reaches 30 TSB and TB still possess quite heavy right tails evenwhen r = 30

This data set has been used as an example for inuence analysis Previous studiesindicate that case number 81 is the most inuential point (Lee amp Wang 1996 Fung ampKwan 1995) So it is enticing to apply the three statistics to the x i s without the 81stcase The QQplots of TML TSB and TB applied to the remaining 87 cases indicate that allthe three statistics still have heavier right tails than that of x 2

4 Actually compared tothose based on all 88 cases the right tails of the three statistics on the 87 cases are evenheavier For this data set our recommended analysis is to use TML applied to x (30)

i With

Bootstrap approach to inference and power analysis 105

TML = 189 and a bootstrap p-value of 71 model (19) is more than good enough inexplaining the relationship of the ve variables

6 Non-convergence with Bootstrap ReplicationsNon-convergence issues exist with resampling and caution is needed for bootstrapinference (Ichikawa amp Konishi 1995) This generally happens when the sample size isnot large enough and especially when a model structure is wrong This issue wasdiscussed in Yuan and Bentler (1997) with Monte Carlo studies on a covariancestructure model Here we propose a reasonable way of handling non-convergencewith bootstrap replications

With an iterative algorithm like Newtonrsquos convergence is generally dened ask Dv ( j ) k lt e where e is a small number and Dv ( j ) = v ( j ) plusmn v ( j plusmn 1) with v ( j ) being thej th step solution Let s = vech(S) be the factor formed by stacking the non-duplicatedelements of S and let its sample counterpart be s = vech(S) Then Dv ( j + 1) =

( Ccedils centj Wj Ccedilsj)plusmn 1 Ccedils centj Wj (s plusmn sj) where Ccedilsj = [ds(v)dv | v ( j )] sj = s(v ( j )) and Wj is the

corresponding weight matrix evaluated at v ( j ) So Dv ( j ) is proportional toS plusmn S(v ( j plusmn 1)) and it is impossible for k Dv ( j ) k to be smaller than e if S(v) is far fromE(S) Although a model is correct in a bootstrap population F0 some bootstrapreplications may still lie far away from the model especially when sampling errors arelarge for small samples For these samples even if we can obtain a solution using somenon-iterative or direct search method the corresponding statistic will be signicantBased on this fact we should distinguish two kinds of non-convergence in bootstrap orgeneral Monte Carlo studies The rst is where a sample contains enough distinct pointsand still cannot reach convergence with a model which should be treated as asignicant replication or a lsquobad modelrsquo The second is where a sample does not containenough distinct observations to t a model For obtaining a TML this number is p + 1 thenumber is p + 1 for obtaining a TB Although a TSB can be obtained once a TML isavailable we need to have a positive denite sample covariance matrix S y ofy i = vech[(x plusmn x )(x plusmn x ) cent ] in order for TSB to make sense (Bentler amp Yuan 1999)and the minimum number of distinct data points for S y to be positive denite is p + 1We should treat the second case as a bad sample and ignore it in bootstrap replications

In practice instead of estimating ca one generally reports the p-value of a statistic TIf all B bootstrap replications result in convergent solutions

pB =B0 + 1 plusmn M

B0 + 1

where M = fTb | Tb gt Tg When non-convergences occur B0 should be dened as thenumber of converged samples plus the number of signicant samples due to a badmodel M should be dened as the number of non-converged samples due to the badmodel plus the number of converged samples that result in Tb gt T With this modi-cation there is no problem with hypothesis testing A similar modication applies toformula (8) for power evaluation With ca = T(B0 (1 plusmn a)) in determining the numeratorin (8) one needs at least B0(1 plusmn a) converged samples when sampling from F0 The proposed procedure fails to generate a power estimate when the number ofconvergences under H0 is below B0(1 plusmn a)

Let l 1(S x ) $ l 2(S x ) $ $ l p(S x ) be the eigenvalues of the sample covariancematrix S x and g1(S y ) $ g 2(S y ) $ $ gp (S y ) be the eigenvalues of the sample

106 Ke-Hai Yuan and Kentaro Hayashi

covariance matrix S y of y i = vech[(x plusmn x )(x plusmn x ) cent ] In computing the three examplesour criteria for bad samples are l p(S x )l 1(S x ) 10plusmn 20 for obtaining TML andgp (S y )g1(S y ) 10plusmn 20 for obtaining TSB and TB If these criteria are satised and themodel still cannot reach convergence (k Dv k lt 10plusmn 5) in 100 iterations we treat thecorresponding statistic as innity All non-convergences with the three examples weredue to signicant replications these are reported in Table 6 Table 6(a) implies that for agiven sample the further a model is from H0 the more often non-convergences occurThe table also suggests that with a given model the heavier the tails of a data set themore often one obtains non-convergences

7 Discussion and conclusionsExisting conclusions regarding the three commonly used statistics TML TSB and TB arebased on asymptotics and Monte Carlo studies The properties inherent in either ofthese approaches may not be enjoyed by these statistics in practice because of either anite sample or an unknown sampling distribution By applying them to a specic data

Bootstrap approach to inference and power analysis 107

Table 6 Non-convergences due to signicant samples

(a) Example 1

Sample x i Sample x (05)i Sample x (25)

i

Statistic H0 Ha1 Ha2 H0 Ha1 Ha2 H0 Ha1 Ha2

TML 0 0 2 0 0 1 0 0 1TSB 0 0 2 0 0 1 0 0 1TB 0 10 21 0 10 27 0 7 20

(b) Example 2

Sample x i Sample x (05)i Sample x (10)

iStatistic H0 H0 H0

TML 9 1 1TSB 9 1 1TB 1 0 0

(c) Example 3

Sample x i Sample x (30)i Sample x i

(81st case removed)Statistic H0 H0 H0

TML 1 0 0TSB 1 0 0TB 0 0 0

set through resampling properties of the three statistics can be visually examined bymeans of QQplots When a data set possesses heavy tails TML will inherit these tails byhaving a heavy right tail Actually all three statistics need the sampling distribution tohave nite fourth-order moments With possible violation of this assumption bypractical data we propose applying a bootstrap procedure to a transformed samplex (r)

i through downweighting The combination of bootstrapping and downweightingnot only offers a theoretical justication for applying the bootstrap to a data set withheavy tails but also provides a quite exible tool for exploring the properties of each ofthe three statistics Even if inference is based on referring a statistic to a chi-squaredistribution one will obtain a more accurate model evaluation by applying TML to atransformed sample x (r)

i Asymptotics justify the three statistics from different perspectives and TB is

asymptotically distribution-free Previous Monte Carlo studies mainly support TSBHowever with proper downweighting TML is generally the one that is best describedby a chi-square distribution However the conclusion that TML is the best statistic forbootstrap inference has to be preliminary Future studies may nd TB or TSB moreappropriate for other data sets We recommend exploring the different procedures for agiven data set as illustrated in Section 5

AcknowledgementsThe authors would like to thank Professor Peter M Bentler and two anonymous referees whoseconstructive comments lead to an improved version of this paper Correspondence concerningthis article should be addressed to Ke-Hai Yuan (kyuanndedu)

ReferencesAmemiya Y amp Anderson T W (1990) Asymptotic chi-square tests for a large class of factor

analysis models Annals of Statistics 18 1453ndash1463Barndorff-Nielsen O E amp Cox D R (1984) Bartlett adjustments to the likelihood ratio statistic

and the distribution of the maximum likelihood estimator Journal of the Royal StatisticalSociety B 46 483ndash495

Bentler P M (1995) EQS Structural equations program manual Encino CA MultivariateSoftware

Bentler P M amp Yuan K-H (1999) Structural equation modeling with small samples Teststatistics Multivariate Behavioral Research 34 181ndash197

Beran R (1986) Simulated power functions Annals of Statistics 14 151ndash173Beran R (1988) Prepivoting test statistics Abootstrap view of asymptotic renements Journal

of the American Statistical Association 83 687ndash697Beran R amp Srivastava M S (1985) Bootstrap tests and condence regions for functions of a

covariance matrix Annals of Statistics 13 95ndash115Bollen K A amp Stine R (1993) Bootstrapping goodness of t measures in structural equation

models In K A BOllen and J S Long (Eds) Testing structural equation models(pp 111ndash135) Newbury Park CA Sage

Browne M W (1984) Asymptotic distribution-free methods for the analysis of covariancestructures British Journal of Mathematical and Statistical Psychology 37 62ndash83

Browne M W amp Shapiro A (1988) Robustness of normal theory methods in the analysis oflinear latent variate models British Journal of Mathematical and Statistical Psychology 41193ndash208

108 Ke-Hai Yuan and Kentaro Hayashi

Cohen J (1988) Statistical power analysis for the behavioral sciences (2nd ed) Hillsdale NJErlbaum

Davison A C amp Hinkley D V (1997) Bootstrap methods and their application CambridgeCambridge University Press

Efron B ampTibshirani R J (1993) An introduction to the bootstrap New York Chapman ampHallFouladi R T (1998) Covariance structure analysis techniques under conditions of multi-

variate normality and nonnormality Modied and bootstrap based test statistics Paperpresented at the annual meeting of the American Educational Research Association San DiegoCA

Fouladi R T (2000) Performance of modied test statistics in covariance and correlationstructure analysis under conditions of multivariate nonnormality Structural EquationModeling 7 356ndash410

Fung W K ampKwan C W (1995) Sensitivityanalysis in factor analysis Difference between usingcovariance and correlation matrices Psychometrika 60 607ndash614

Hall P (1992) The bootstrap and Edgeworth expansion New York Springer-VerlagHolzinger K J amp Swineford F (1939) A study in factor analysis The stability of a bi-factor

solution University of Chicago Press Supplementary Educational Monographs No 48Hu L T Bentler P M amp Kano Y (1992) Can test statistics in covariance structure analysis be

trusted Psychological Bulletin 112 351ndash362Huber P J (1981) Robust statistics New York WileyIchikawa M amp Konishi S (1995) Application of the bootstrap methods in factor analysis

Psychometrika 60 77ndash93Lawley D N amp Maxwell A E (1971) Factor analysis as a statistical method (2nd ed) New

York American ElsevierLee S Y ampWang S J (1996) Sensitivityanalysis of structural equation models Psychometrika

61 93ndash108MacCallum R C Browne M W ampSugawara H M (1996) Power analysis and determination of

sample size for covariance structure modeling Psychological Methods 1 130ndash149Mardia K V (1970) Measure of multivariate skewness and kurtosis with applications

Biometrika 57 519ndash530Mardia K V Kent J T amp Bibby J M (1979) Multivariate analysis New York Academic PressMaronna R A (1976) Robust M-estimators of multivariate location and scatter Annals of

Statistics 4 51ndash67Mooijaart A ampBentler P M (1991) Robustness of normal theory statistics in structural equation

models Statistica Neerlandica 45 159ndash171Neumann C S (1994) Structural equation modeling of symptoms of alcoholism and

psychopathology Doctoral dissertation University of Kansas LawrenceSatorra A amp Bentler P M (1988) Scaling corrections for chi-square statistics in covariance

structure analysis In 1988 Proceedings of Business and Economics Sections (pp 308ndash313)Alexandria VA American Statistical Association

Satorra A amp Bentler P M (1990) Model conditions for asymptotic robustness in the analysis oflinear relations Computational Statistics amp Data Analysis 10 235ndash249

Satorra A ampSaris W E (1985) Power of the likelihood ratio test in covariance structure analysisPsychometrika 50 83ndash90

Satorra A Saris W E amp de Pijper W M (1991) Acomparison of several approximations to thepower function of the likelihood ratio test in covariance structure analysis StatisticaNeerlandica 45 173ndash185

Shapiro A (1987) Robustness properties of the MDF analysis of moment structures SouthAfrican Statistical Journal 21 39ndash62

Steiger J H amp Lind J M (1980) Statistically based tests for the number of common factorsPaper presented at the annual meeting of the Psychometric Society Iowa City

Steiger J H Shapiro A amp Browne M W (1985) On the multivariate asymptotic distribution ofsequential chi-square statistics Psychometrika 50 253ndash264

Bootstrap approach to inference and power analysis 109

Tanaka Y Watadani S amp Moon S H (1991) Inuence in covariance structure analysis with anapplication to conrmatory factor analysis Communications in Statistics Theory andMethods 20 3805ndash3821

Tyler D E (1983) Robustness and efciency properties of scatter matrices Biometrika 70411ndash420

Wakaki H Eguchi S amp Fujikoshi Y (1990) A class of tests for a general covariance structureJournal of Multivariate Analysis 32 313ndash325

Yuan K-H amp Bentler P M (1997) Mean and covariance structure analysis Theoretical andpractical improvements Journal of the American Statistical Association 92 767ndash774

Yuan K-H amp Bentler P M (1998) Structural equation modeling with robust covariancesIn A E Raftery (ed) Sociological methodology 1998 (pp 363ndash396) Boston BlackwellPublishers

Yuan K-H amp Bentler P M (1999) On normal theory and associated test statistics in covariancestructure analysis under two classes of nonnormal distributions Statistica Sinica 9 831ndash853

Yuan K-H Chan W amp Bentler P M (2000) Robust transformation with applications tostructural equation modelling British Journal of Mathematical and Statistical Psychology53 31ndash50

Yuan K-H Bentler P M amp Chan W (2001) Structural equation modeling with heavy taileddistributions Manuscript submitted for publication

Yuan K-H amp Hayashi K (2001) On using an empirical Bayes covariance matrix in bootstrapapproach to covariance structure analysis Manuscript submitted for publication

Yung Y F amp Bentler P M (1994) Bootstrap-corrected ADF test statistics in covariance structureanalysis British Journal of Mathematical and Statistical Psychology 47 63ndash84

Yung Y F amp Bentler P M (1996) Bootstrapping techniques in analysis of mean and covariancestructures In G A Marcoulides and R E Schumacker (Eds) Advanced structural equationmodeling Techniques and issues (pp 195ndash226) Hillsdale NJ Lawrence Erlbaum

Zhang J amp Boos D D (1992) Bootstrap critical values for testing homogeneity of covariancematrices Journal of the American Statistical Association 87 425ndash429

Received 9 November 2000 revised version received 22 March 2002

110 Ke-Hai Yuan and Kentaro Hayashi

Page 6: Bootstrap approach to inference and power analysis based on three test statistics for covariance structure models

Suppose we know G then asymptotic inference is based on comparing the observedstatistic T = T(x 1 x n) to the critical value Gplusmn 1(1 plusmn a) Equation (11a) implies thatasymptotic inference can achieve a designated level a to the order of accuracy ofOp(1n) According to (12) bootstrap inference based on an asymptotically pivotalstatistic can achieve the level a to a higher order of accuracy When T is notasymptotically pivotal G(t) in (11) depends on some unknown parameters One hasto estimate G(t) by G(t) for asymptotic inference Similarly with the bootstrap one hasto implicitlyestimate the unknown parameters in G(t ) and consequently it achieves thesame order of accuracy to the level a as that based on G(t) So for more accurateinference we should choose a statistic that is as nearly pivotal as possible Beran (1988)gave a nice discussion on using a pivotal statistic for bootstrap and the order of accuracyfor Type I errors

When data are normallydistributed (TML| H0) is asymptoticallydistributed as x 2p plusmn q

where p = p( p + 1)2 and q is the number of unknown parameters in S(v) Conse-quently TML is asymptotically pivotal (Davison amp Hinkley 1997 p 139) More generalconditions also exist for TML to be asymptotically pivotal with some specic models(Amemiya amp Anderson 1990 Browne amp Shapiro 1988 Mooijaart amp Bentler 1991Satorra ampBentler 1990 Shapiro 1987 Yuan ampBentler 1999) Unfortunately there is noeffective way of verifying these conditions in practice The statistic TSB is obtained byrescaling TML using fourth-order sample moments (Satorra amp Bentler 1988) Within theclass of elliptical or pseudo-elliptical distributions with nite fourth-order moments(TSB | H0) approaches x 2

p plusmn q (Yuan amp Bentler 1999) So TSB is asymptotically pivotalwhen the sampling distribution is elliptical or pseudo-elliptical It is not asymptoti-cally pivotal for other types of non-normal distribution In contrast to TML and TSB TBis asymptotically pivotal for any sampling distribution with nite fourth-ordermoments

The above discussion may imply that TB is the preferred statistic for bootstrapinference in CSA However there exist conditions that may hide the pivotal property ofTB with nite samples In addition to the asymptotically pivotal property Davison andHinkley (1997 Section 251) discussed nearly pivotal properties of a statistic T whichrequires the distribution of T to approximately follow the same distribution whensampling from Fx or Fx Since the target function G(t) for the three test statistics is thecdf of x 2

p plusmn q the better a statistic is approximated by a chi-square distribution the moreaccurate its bootstrap inference is in controlling Type I errors Because sample size is aserious issue with CSA a statistic that is asymptoticallypivotal may not be nearlypivotalas will be shown in Section 5

4 Heavy tails with practical dataBootstrap inference based on TML TB or TSB requires Fx to have nite fourth-ordermoments Of course the sample fourth-order moments of any sample will always benite However if some of the fourth-order moments (eg kurtoses) are quite large thecorresponding population fourth-order moments may actuallybe innite Even when allthe population fourth-order moments are nite inference based on the bootstrap willnot be accurate when Fx possesses heavy tails Here we propose applying the bootstrapprocedure to a sample in which cases contributing to heavy tails are properly down-weighted Following the study in Yuan Chan and Bentler (2000) we only use theHuber-type weights (Huber 1981 Tyler 1983) in controlling those cases

98 Ke-Hai Yuan and Kentaro Hayashi

For a p-variate sample x 1 x n let

d i = d (x i m S) = [(x i plusmn m ) cent Splusmn 1(x i plusmn m )]12

be the Mahalanobis distance and u 1(t ) and u 2(t) be non-negative scalar functions Arobust M-estimator ( ˆm S) can be obtained by solving (Maronna 1976)

m =Xn

i = 1

u 1(di )x i

iquest Xn

i = 1

u1(di ) (13a)

S =Xn

i = 1

u 2(d 2i )(x i plusmn m )(x i plusmn m ) cent n (13b)

With decreasing functions u 1(t) and u 2(t) cases having greater di s receive smallerweights and thus their effects are controlled Let r represent the proportion ofoutlying cases one wants to control and r be a constant dened by P(x 2

p gt r 2) = rThe Huber-type weights corresponding to this r are

u 1(d ) =1 if d r

rd if d gt r

raquo(14)

and u 2(d 2) = fu 1(d )g 2J where J is a constant such that Efx 2p u 2(x 2

p)g = p whichmakes the estimate S unbiased if Fx = Np( m S) We may regard r as a tuning parameterwhich can be adjusted according to the tails of a specic data set Applying differenttypes of weights to several data sets Yuan Bentler and Chan (2001) found that the mostefcient parameter estimates often go with Huber-type weights

Let ( ˆm S) be the converged solution to (13) u2 i = u 2fd 2(x i ˆm S)g and

x (r)i = f

u 2 i

p(x i plusmn ˆm )g (15)

Then we can rewrite (13b) as

S =Xn

i = 1

x (r)i x (r)

i cent n

which is just the sample covariance matrix of the x (r)i Yuan et al (2000) proposed using

(15) as a downweighting transformation formula Working with several practical datasets they found that the transformed sample x (r)

i is much better approximated by anormal distribution than the original sample x i As we shall see when applying thebootstrap to a sample x (r)

i the test statistic TML is approximately pivotal For x (r)i we

have

E(x (r)i 1 x (r)

i 2 x (r)i 3 x (r)

i 4 ) = E[u 22(d 2

i )(x i 1 plusmn m1)(x i 2 plusmn m2)(x i 3 plusmn m 3)(x i 4 plusmn m4)] + o(1) (16)

where x (r)i j is the j th element of x (r)

i Because the denominator of u 2(d 2i ) in (14)

contains d 2i when d 2

i gt r 2 (16) is bounded Even when the fourth-order moments of Fx

do not exist the corresponding fourth-order moments of the x (r)i are still nite

When interest lies in the structure of the population covariance matrix S = Cov(x i)one needs to know whether transformation (15) changes the structure of S Let S (r) bethe population covariance matrix of x (r)

i then S (r) = kS when the sampling distribu-tion is elliptically symmetric For almost all the commonly used models in the socialsciences modelling S is equivalent to modelling S (r) (Browne 1984) Yuan et al (2000)further illustrated that outlying cases create signicant multivariate skewness and thatthe sample covariance matrix S x is biased in such a situation Analysis based on x (r)

i can

Bootstrap approach to inference and power analysis 99

successfully remove the bias in S x and recover the correct model structure More detailson the merits of robust approaches to CSAcan be found in Yuan and Bentler (1998) andYuan et al (2000)

5 ApplicationsOur purpose in this section is to compare the performance of TML TB and TSB using realdata We will show that by combining the bootstrap and a downweighting transforma-tion a nearly optimal procedure may be found for analysing a given data set If a teststatistic T is nearly pivotal then its evaluations Tb at the B0 bootstrap replicationsshould approximately follow a chi-square distribution There are a variety of tools toevaluate this property We favour the quantilendashquantile (QQ) plot because of itsvisualization value If T follows x 2

d the plot of ordered Tb s against the quantiles ofx 2

d should form an approximately straight line with slope2

p2 Asignicant departure

from this line indicates violations of the pivotal property To save space the QQplots arepresented in a document available on the Internet (httpwwwndedu kyuanpapersbsinfpowgpdf ) We choose B0 = B1 = B = 1000 in estimating ca and ba although a smaller B may be enough in obtaining these estimates (Efron amp Tibshirani1993)

Example 1 Holzinger and Swineford (1939) provide a data set consisting of 24cognitive variables on 145 subjects We use nine of the variables in our study visualperception cubes and lozenges measuring a spatial ability factor paragraph compre-hension sentence completion and word meaning measuring a verbal ability factoraddition counting dots and straight-curved capitals measuring (via the imposition of atime limit) a speed factor Let x represent the nine variables then the conrmatoryfactor model

x = L f + e Cov(x ) = LFL cent + W (17a)

with

L =

10 l 21 l 31 0 0 0 0 0 0

0 0 0 10 l 52 l 62 0 0 0

0 0 0 0 0 0 10 l 83 l 93

0

BB

1

CCA

cent

F =

f11 f12 f13

f 21 f22 f 23

f 31 f32 f 33

0

BB

1

CCA

(17b)

represents Holzinger and Swinefordrsquos hypothesis We assume that the measurementerrors are uncorrelated with W = Cov(e ) being a diagonal matrix There are q = 21unknown parameters and the model degrees of freedom are 24

Mardiarsquos (1970) multivariate kurtosis for the nine-variable model is 304 implyingthat the data may come from a distribution with heavier tails than those of a normaldistribution We therefore apply the downweighting transformation (15) in order forTML to be approximately pivotal Based on previous research (Yuan et al 20002001) we apply the three statistics to three samples the raw sample x i and thetransformed samples x (05)

i and x (25)i The following bootstrap procedures include

further transforming these samples to satisfy H0 or H1 as given in Section 2

100 Ke-Hai Yuan and Kentaro Hayashi

QQ plots for the three statistics on the raw data set suggest that no statistic isapproximately pivotal All have heavier right tails than that of x 2

24 This is expected forTML because the sample x i exhibits heavier tails than those of a normal distributionPrevious studies suggest that a large sample size is needed for TB to behave like achi-square random variable Asample size of n = 145 may not be large enough or theremay be other unknown factors that prevent TB from behaving like a chi-square variableEven though many simulation studies recommend TSB this statistic certainly has aheavier right tail than that of x 2

24 for this data setWhen applied to the transformed sample x (05)

i all the three statistics still haveheavier right tails than that of x 2

24 When applied to x (25)i the QQplots indicate that TML

is approximately described by x 224 but TB or TSB are not Consequently the suggested

procedure for this data set is to apply TML to the transformed sample x (25)i Boot-

strapping with TML on this sample not only leads to more efcient parameter estimates(Yuan et al 2001) but also provides a more accurate Type I error estimate

Table 1 gives the signicance levels of model (17) evaluated by various proceduresSeveral differences exist among these procedures First all the bootstrap p-values ( pB)

are greater than those ( px 2 ) obtained by referring the test statistics to x 224 The only

comparable pair of pB and px 2 is when TML is applied to the sample x (25)i Second the TB

statistic for each sample is the largest among the three statistics but the pB for TB is alsothe largest for any of the three samples This is in contrast to conclusions regarding theperformance of TB when referring to a chi-square distribution (Hu et al 1992 Fouladi2000 Yuan amp Bentler 1997) Third all the pBs based on any statistic with any of thesamples are quite comparable implying the robustness of bootstrap inference Theabove phenomena can be further explained by the critical values c 05 in Table 2 All thec 05 s are greater than x 2

24(95)plusmn 1 = 36415 the 95th percentile of x 224 When the heavier

tails are downweighted in the sample the T s corresponding to each of the threestatistics behave more like x 2

24 and the corresponding c 05 is nearer 36415 Although TBis the largest among the three statistics on anyof the samples the corresponding c 05 forTB is also the largest explaining whythe associated pBs for TB can also be the largest Onthe other hand the traditional inference procedure is to refer TB to x 2

24 for signicanceWith such a xed reference distribution that does not take the increased variabilityof TBinto account a larger TB generally corresponds to a smaller px 2

The power properties of TML TB and TSB are evaluated at two alternatives Startingwith model (17) the parameter l 91 is the only signicant path identied by theLagrange multiplier test in EQS (Bentler 1995) Adding this extra parameter to (17b)the rst alternative Sa1 is the estimated covariance matrix obtained by tting this model

Bootstrap approach to inference and power analysis 101

Table 1 Statistics bootstrap p-values ( pB) and p-values ( px 2 ) referring to x 224

Sample x i Sample x (05)i Sample x (25)

i

Statistic pB px 2 pB p x 2 pB p x 2

TML 51187 017 001 49408 010 002 48883 004 002TSB 48696 020 002 50318 013 001 53674 003 000TB 56726 055 000 62095 024 000 66910 014 000

to the raw sample x i by minimizing DML The second alternative Sa 2 is the samplecovariance matrix of the raw sample With these two alternatives the NCPs in (9) for thethree statistics are given in Table 3 Notice that once a Sa is given the d in (9)corresponding to TML is xed and does not depend on the sample Because both TSB andTB involve fourth-order sample moments their corresponding ds change as the tails ofthe data set change QQplots for the ordered T s against quantiles of the x 2

24(d) indicatethat the distributions of TML and TSB applied to x i are well approximated by thecorresponding non-central chi-squares Because TML under H0 applied to x (25)

i is wellapproximated by x 2

24 we would expect its distributions under the two alternatives alsoto be well approximated by x 2

24(2260) and x 224(5119) respectively This expectation is

not fullled as judged from the corresponding QQplots The reason for this is that evenfor perfectly normal data simulated from Np( m S) the statistic TML will not behave like anon-central chi-square unless d is quite small and n is large Because the NCP tends to beoverestimated when it is large (Satorra Saris amp de Pijper 1991) the corresponding QQplots are below the x = y line

Can we trust the results of power analysis in the traditional approach when TML andTSB are well approximated by non-central chi-square distributions in this example Theanswer is no This is because the critical value x 2

24(95)plusmn 1 = 36415 which does not takethe actual variabilityof TMLor TSB into account is not a good estimate of ca in (3) As canbe seen from Table 2 there exist quite substantial differences between x 2

24(95)plusmn 1 andthe c05 s Actually power analysis based on a non-central chi-square table is not reliablefor even perfectly normal data when S(v) is not near S a Table 4 contrasts the powerestimates based on the bootstrap (ba ) with those based on the traditional approach (ba)

using NCPs in Table 3 The ba s are uniformly smaller than the ba s especially when theNCPs are not huge When an NCP is large enough there is almost no overlap betweenthe distribution of (T | H0) and that of (T | H1) Any sensible inference procedure can tellthe difference between the two and consequently the power estimates under Ha2 inTable 4 are approximately the same

102 Ke-Hai Yuan and Kentaro Hayashi

Table 2 Bootstrap critical values c 05

Statistic Sample x i Sample x (05)i Sample x (25)

i

TML 42348 39991 37022TSB 41694 41067 40587TB 57930 55120 53925

Table 3 Non-centrality parameter estimates associated with each statistic and sample

Sample x i Sample x (05)i Sample x (25)

i

Statistic Ha1 Ha2 Ha1 Ha2 Ha1 Ha2

TML 22603 51187 22603 51187 22603 51187TSB 21763 48695 23107 51901 24875 55948TB 20289 56726 20130 63187 20683 69516

Table 4 illustrates the power properties of each statistic when the data changeWhen the outlying cases are downweighted not only do the parameter estimatesbecome more efcient (Yuan et al 2001) but also the powers for identifying anincorrect model are higher for all the statistics The powers of TML and TSB arequite comparable while the power of TB is much lower This is in contrast to theconclusion for power analysis based on the non-central chi-square where TB hasthe highest power in detecting a wrong model (Fouladi 2000 Yuan amp Bentler1997)

Next we focus on testing model (17) for close t We will only present results of thebootstrap using TML The use of TSB or TB is essentially the same MacCallum et al(1996) recommended testing close t (RMSEA 05) fair t (05 lt RMSEA 08)mediocre t (08 RMSEA 10) and poor t (10 lt RMSEA) We will conduct abootstrap procedure for these tests With (10) it is easy to see thatRMSEA = [minv D(S c S(v))( p plusmn q)]12 = e corresponds to dc = n ( p plusmn q)e2 Thed c s corresponding to e = 05 08 and 10 are given in the second row of Table 5The 95th percentiles of the corresponding x 2

24(dc ) are in the third row of the tableThese are respectively xed critical values for testing close t fair t and mediocre tin the approach of MacCallum et al (1996)

Bootstrap approach to inference and power analysis 103

Table 4 Bootstrap power estimates (ba ) and power estimates (ba) referring to x 24(d)

Sample x i Sample x (05)i Sample x (25)

i

Statistic Ha1 Ha2 Ha1 Ha2 Ha1 Ha2

TML ba 627 993 661 997 715 998ba 803 998 803 998 803 998

TSB ba 613 989 661 997 717 998ba 782 997 814 998 851 999

TB ba 374 983 454 991 501 995ba 743 999 739 1000 754 1000

Table 5 Test for close t

RMSEA 05 08 10dc 87 22272 3480

Sample TML x 224(dc 95)plusmn 1 48919 66926 82764

x i 51187 ca 53013 69928 84287p x 2 033 315 695pB 073 359 695

x (05)i 49408 ca 49548 55595 59620

p x 2 046 368 743pB 053 107 163

x (25)i 48883 ca 46305 51741 55899

p x 2 050 384 756pB 034 080 131

Let S x be the sample covariance matrix of x i and v be the corresponding maximumlikelihood estimate of v The values of h in the solution for Sc in the form ofS

h= (1 plusmn h)S(v) + hS x corresponding to e = 05 08 and 10 are respectively

0421 61 0669 81 and 0830 83 Applying the bootstrap procedure outlined in Section24 to the samples x i x (05)

i and x (25)i the estimates ca and p-values for each of the

samples are reported in Table 5 The results indicate that there are substantialdifferences between the traditional p-values px 2 and the bootstrap p-values pBespecially when RMSEA = 10 for the downweighted samples It is obvious that TMLwhen RMSEA = 10 has a much shorter right tail than that of x 2

24(dc) This fact is alsoreected in the corresponding ca s Table 5 once again illustrates the fact that given aclose t the behaviour of TML cannot be described by a non-central chi-square unlessthe corresponding TML under H0 in (1) has a heavy right tail As when testing for exactt using a chi-square table to judge the signicance of a statistic in testing model (17) forclose t is misleading

Example 2 Neumann (1994) presented an alcohol and psychological symptomdata set consisting of 10 variables and 335 cases The two variables in x = (x 1 x 2) cent arerespectively family history of psychopathology and family history of alcoholism whichare indicators for a latent construct of family history The eight variables iny = (y1 y8) cent are respectively the age of rst problem with alcohol age of rstdetoxication from alcohol alcohol severity score alcohol use inventory SCL-90psychological inventory the sum of the Minnesota Multiphasic Personality Inventoryscores the lowest level of psychosocial functioning during the past year and the highestlevel of psychosocial functioning during the past year With two indicators for eachlatent construct these eight variables respectively measure age of onset alcoholsymptoms psychopathology symptoms and global functioning Neumannrsquos (1994)theoretical model for this data set is

x = Lx y + d y = L y h + e (18a)

h = B h + Gy + z (18b)

where

Lx =10

l 1

Aacute Ly =

1 l 2 0 0 0 0 0 0

0 0 1 l 3 0 0 0 0

0 0 0 0 1 l 4 0 0

0 0 0 0 0 0 1 l 5

0

BBBBB

1

CCCCCA

cent

(18c)

B =

0 0 0 0b 21 0 0 0

b 31 b 32 0 00 b42 b43 0

0

BBB

1

CCCA G =

g11

0

00

0

BBB

1

CCCA F = Var(y) (18d )

and e d and z are vectors of errors whose elements are all uncorrelated The modeldegrees of freedom are 29

With Mardiarsquos multivariate kurtosis equal to 1476 the data set may come from adistribution with heavy tails Abootstrap procedure is more appropriate after the heavytails are properly downweighted Actually Yuan et al (2001) found that the samplex (10)

i leads to the most efcient parameter estimates in (18) among various proceduresHere we apply the three statistics to the two samples x i and x (10)

i Our purpose is to

104 Ke-Hai Yuan and Kentaro Hayashi

explore the pivotal property of TML TSB and TB on each sample After noticing that alarge portion of the QQ plot for TML applying to x (10)

i is below the x = y line ouranalysis also includes the sample x (05)

i The QQplots of the three statistics applied to x i indicate that none of them is nearly

pivotal TML and TSB are for the most part above the x = y line however both their lefttails are slightlybelow the x = y line This implies that some bootstrap samples t model(18) extremely well a phenomenon which can be caused by too many data points nearthe centre of the distribution Adownweighting procedure only affects data points thatcause a test statistic to possess a heavy right tail It is not clear to us how to deal with asample when a test statistic has a light left tail The QQplots of TML and TSB on x (05)

i andx (10)

i suggest that their heavier right tails on x i are under control However the right tailof TB is still quite heavy when compared to x 2

29 The downweighting transformationH(10) not only just controls the right tail of TML but also makes most of the QQplot fallbelow the x = y line Visually inspecting the QQ plots of TML on x i x (05)

i and x (10)i

suggests that a downweighting transformation by H(r) with 0 lt r lt 05 may lead toa better procedure for analysing the alcohol and psychological symptom data setbased on TML Since any procedure is only an approximation to the real world andTML is nearly pivotal when applying to x (05)

i we recommend the analysis using TMLon x (05)

i for this data set This leads to TML = 4243 with a bootstrap p-value of 040implying that the model (18) marginally ts the alcohol and psychological symptomdata set

Both the raw data sets in Examples 1 and 2 have signicant multivariate kurtoses andthe downweighting transformation (15) achieves approximately pivotal behaviour ofthe statistics We maywonder how these test statistics behave when applied to a data setthat does not have a signicant multivariate kurtosis This is demonstrated in thefollowing example

Example 3 Table 121 of Mardia Kent and Bibby (1979) contains test scores ofn = 88 students on p = 5 topics mechanics vectors algebra analysis and statisticsThe rst two topics were tested with closed-book exams and the last three withopen-book exams Since these two examination methods may tap different abilities atwo-factor model as in (17a) with

L =10 l 21 0 0 00 0 10 l 42 l 52

sup3 acutecent F =

f11 f12

f 21 f 22

sup3 acute(19)

was proposed and conrmed by Tanaka Watadani and Moon (1991) Mardiarsquos multi-variate kurtosis for this data set equal to 0057 is not signicant It would be interestingto see how TML TB and TSB perform on this data set

QQ plots for the three statistics applied to x i indicate that all of them have heavierright tails than that of x 2

4 The QQ plot for TML applied to x (r)i continues to exhibit a

heavier right tail until r reaches 30 TSB and TB still possess quite heavy right tails evenwhen r = 30

This data set has been used as an example for inuence analysis Previous studiesindicate that case number 81 is the most inuential point (Lee amp Wang 1996 Fung ampKwan 1995) So it is enticing to apply the three statistics to the x i s without the 81stcase The QQplots of TML TSB and TB applied to the remaining 87 cases indicate that allthe three statistics still have heavier right tails than that of x 2

4 Actually compared tothose based on all 88 cases the right tails of the three statistics on the 87 cases are evenheavier For this data set our recommended analysis is to use TML applied to x (30)

i With

Bootstrap approach to inference and power analysis 105

TML = 189 and a bootstrap p-value of 71 model (19) is more than good enough inexplaining the relationship of the ve variables

6 Non-convergence with Bootstrap ReplicationsNon-convergence issues exist with resampling and caution is needed for bootstrapinference (Ichikawa amp Konishi 1995) This generally happens when the sample size isnot large enough and especially when a model structure is wrong This issue wasdiscussed in Yuan and Bentler (1997) with Monte Carlo studies on a covariancestructure model Here we propose a reasonable way of handling non-convergencewith bootstrap replications

With an iterative algorithm like Newtonrsquos convergence is generally dened ask Dv ( j ) k lt e where e is a small number and Dv ( j ) = v ( j ) plusmn v ( j plusmn 1) with v ( j ) being thej th step solution Let s = vech(S) be the factor formed by stacking the non-duplicatedelements of S and let its sample counterpart be s = vech(S) Then Dv ( j + 1) =

( Ccedils centj Wj Ccedilsj)plusmn 1 Ccedils centj Wj (s plusmn sj) where Ccedilsj = [ds(v)dv | v ( j )] sj = s(v ( j )) and Wj is the

corresponding weight matrix evaluated at v ( j ) So Dv ( j ) is proportional toS plusmn S(v ( j plusmn 1)) and it is impossible for k Dv ( j ) k to be smaller than e if S(v) is far fromE(S) Although a model is correct in a bootstrap population F0 some bootstrapreplications may still lie far away from the model especially when sampling errors arelarge for small samples For these samples even if we can obtain a solution using somenon-iterative or direct search method the corresponding statistic will be signicantBased on this fact we should distinguish two kinds of non-convergence in bootstrap orgeneral Monte Carlo studies The rst is where a sample contains enough distinct pointsand still cannot reach convergence with a model which should be treated as asignicant replication or a lsquobad modelrsquo The second is where a sample does not containenough distinct observations to t a model For obtaining a TML this number is p + 1 thenumber is p + 1 for obtaining a TB Although a TSB can be obtained once a TML isavailable we need to have a positive denite sample covariance matrix S y ofy i = vech[(x plusmn x )(x plusmn x ) cent ] in order for TSB to make sense (Bentler amp Yuan 1999)and the minimum number of distinct data points for S y to be positive denite is p + 1We should treat the second case as a bad sample and ignore it in bootstrap replications

In practice instead of estimating ca one generally reports the p-value of a statistic TIf all B bootstrap replications result in convergent solutions

pB =B0 + 1 plusmn M

B0 + 1

where M = fTb | Tb gt Tg When non-convergences occur B0 should be dened as thenumber of converged samples plus the number of signicant samples due to a badmodel M should be dened as the number of non-converged samples due to the badmodel plus the number of converged samples that result in Tb gt T With this modi-cation there is no problem with hypothesis testing A similar modication applies toformula (8) for power evaluation With ca = T(B0 (1 plusmn a)) in determining the numeratorin (8) one needs at least B0(1 plusmn a) converged samples when sampling from F0 The proposed procedure fails to generate a power estimate when the number ofconvergences under H0 is below B0(1 plusmn a)

Let l 1(S x ) $ l 2(S x ) $ $ l p(S x ) be the eigenvalues of the sample covariancematrix S x and g1(S y ) $ g 2(S y ) $ $ gp (S y ) be the eigenvalues of the sample

106 Ke-Hai Yuan and Kentaro Hayashi

covariance matrix S y of y i = vech[(x plusmn x )(x plusmn x ) cent ] In computing the three examplesour criteria for bad samples are l p(S x )l 1(S x ) 10plusmn 20 for obtaining TML andgp (S y )g1(S y ) 10plusmn 20 for obtaining TSB and TB If these criteria are satised and themodel still cannot reach convergence (k Dv k lt 10plusmn 5) in 100 iterations we treat thecorresponding statistic as innity All non-convergences with the three examples weredue to signicant replications these are reported in Table 6 Table 6(a) implies that for agiven sample the further a model is from H0 the more often non-convergences occurThe table also suggests that with a given model the heavier the tails of a data set themore often one obtains non-convergences

7 Discussion and conclusionsExisting conclusions regarding the three commonly used statistics TML TSB and TB arebased on asymptotics and Monte Carlo studies The properties inherent in either ofthese approaches may not be enjoyed by these statistics in practice because of either anite sample or an unknown sampling distribution By applying them to a specic data

Bootstrap approach to inference and power analysis 107

Table 6 Non-convergences due to signicant samples

(a) Example 1

Sample x i Sample x (05)i Sample x (25)

i

Statistic H0 Ha1 Ha2 H0 Ha1 Ha2 H0 Ha1 Ha2

TML 0 0 2 0 0 1 0 0 1TSB 0 0 2 0 0 1 0 0 1TB 0 10 21 0 10 27 0 7 20

(b) Example 2

Sample x i Sample x (05)i Sample x (10)

iStatistic H0 H0 H0

TML 9 1 1TSB 9 1 1TB 1 0 0

(c) Example 3

Sample x i Sample x (30)i Sample x i

(81st case removed)Statistic H0 H0 H0

TML 1 0 0TSB 1 0 0TB 0 0 0

set through resampling properties of the three statistics can be visually examined bymeans of QQplots When a data set possesses heavy tails TML will inherit these tails byhaving a heavy right tail Actually all three statistics need the sampling distribution tohave nite fourth-order moments With possible violation of this assumption bypractical data we propose applying a bootstrap procedure to a transformed samplex (r)

i through downweighting The combination of bootstrapping and downweightingnot only offers a theoretical justication for applying the bootstrap to a data set withheavy tails but also provides a quite exible tool for exploring the properties of each ofthe three statistics Even if inference is based on referring a statistic to a chi-squaredistribution one will obtain a more accurate model evaluation by applying TML to atransformed sample x (r)

i Asymptotics justify the three statistics from different perspectives and TB is

asymptotically distribution-free Previous Monte Carlo studies mainly support TSBHowever with proper downweighting TML is generally the one that is best describedby a chi-square distribution However the conclusion that TML is the best statistic forbootstrap inference has to be preliminary Future studies may nd TB or TSB moreappropriate for other data sets We recommend exploring the different procedures for agiven data set as illustrated in Section 5

AcknowledgementsThe authors would like to thank Professor Peter M Bentler and two anonymous referees whoseconstructive comments lead to an improved version of this paper Correspondence concerningthis article should be addressed to Ke-Hai Yuan (kyuanndedu)

ReferencesAmemiya Y amp Anderson T W (1990) Asymptotic chi-square tests for a large class of factor

analysis models Annals of Statistics 18 1453ndash1463Barndorff-Nielsen O E amp Cox D R (1984) Bartlett adjustments to the likelihood ratio statistic

and the distribution of the maximum likelihood estimator Journal of the Royal StatisticalSociety B 46 483ndash495

Bentler P M (1995) EQS Structural equations program manual Encino CA MultivariateSoftware

Bentler P M amp Yuan K-H (1999) Structural equation modeling with small samples Teststatistics Multivariate Behavioral Research 34 181ndash197

Beran R (1986) Simulated power functions Annals of Statistics 14 151ndash173Beran R (1988) Prepivoting test statistics Abootstrap view of asymptotic renements Journal

of the American Statistical Association 83 687ndash697Beran R amp Srivastava M S (1985) Bootstrap tests and condence regions for functions of a

covariance matrix Annals of Statistics 13 95ndash115Bollen K A amp Stine R (1993) Bootstrapping goodness of t measures in structural equation

models In K A BOllen and J S Long (Eds) Testing structural equation models(pp 111ndash135) Newbury Park CA Sage

Browne M W (1984) Asymptotic distribution-free methods for the analysis of covariancestructures British Journal of Mathematical and Statistical Psychology 37 62ndash83

Browne M W amp Shapiro A (1988) Robustness of normal theory methods in the analysis oflinear latent variate models British Journal of Mathematical and Statistical Psychology 41193ndash208

108 Ke-Hai Yuan and Kentaro Hayashi

Cohen J (1988) Statistical power analysis for the behavioral sciences (2nd ed) Hillsdale NJErlbaum

Davison A C amp Hinkley D V (1997) Bootstrap methods and their application CambridgeCambridge University Press

Efron B ampTibshirani R J (1993) An introduction to the bootstrap New York Chapman ampHallFouladi R T (1998) Covariance structure analysis techniques under conditions of multi-

variate normality and nonnormality Modied and bootstrap based test statistics Paperpresented at the annual meeting of the American Educational Research Association San DiegoCA

Fouladi R T (2000) Performance of modied test statistics in covariance and correlationstructure analysis under conditions of multivariate nonnormality Structural EquationModeling 7 356ndash410

Fung W K ampKwan C W (1995) Sensitivityanalysis in factor analysis Difference between usingcovariance and correlation matrices Psychometrika 60 607ndash614

Hall P (1992) The bootstrap and Edgeworth expansion New York Springer-VerlagHolzinger K J amp Swineford F (1939) A study in factor analysis The stability of a bi-factor

solution University of Chicago Press Supplementary Educational Monographs No 48Hu L T Bentler P M amp Kano Y (1992) Can test statistics in covariance structure analysis be

trusted Psychological Bulletin 112 351ndash362Huber P J (1981) Robust statistics New York WileyIchikawa M amp Konishi S (1995) Application of the bootstrap methods in factor analysis

Psychometrika 60 77ndash93Lawley D N amp Maxwell A E (1971) Factor analysis as a statistical method (2nd ed) New

York American ElsevierLee S Y ampWang S J (1996) Sensitivityanalysis of structural equation models Psychometrika

61 93ndash108MacCallum R C Browne M W ampSugawara H M (1996) Power analysis and determination of

sample size for covariance structure modeling Psychological Methods 1 130ndash149Mardia K V (1970) Measure of multivariate skewness and kurtosis with applications

Biometrika 57 519ndash530Mardia K V Kent J T amp Bibby J M (1979) Multivariate analysis New York Academic PressMaronna R A (1976) Robust M-estimators of multivariate location and scatter Annals of

Statistics 4 51ndash67Mooijaart A ampBentler P M (1991) Robustness of normal theory statistics in structural equation

models Statistica Neerlandica 45 159ndash171Neumann C S (1994) Structural equation modeling of symptoms of alcoholism and

psychopathology Doctoral dissertation University of Kansas LawrenceSatorra A amp Bentler P M (1988) Scaling corrections for chi-square statistics in covariance

structure analysis In 1988 Proceedings of Business and Economics Sections (pp 308ndash313)Alexandria VA American Statistical Association

Satorra A amp Bentler P M (1990) Model conditions for asymptotic robustness in the analysis oflinear relations Computational Statistics amp Data Analysis 10 235ndash249

Satorra A ampSaris W E (1985) Power of the likelihood ratio test in covariance structure analysisPsychometrika 50 83ndash90

Satorra A Saris W E amp de Pijper W M (1991) Acomparison of several approximations to thepower function of the likelihood ratio test in covariance structure analysis StatisticaNeerlandica 45 173ndash185

Shapiro A (1987) Robustness properties of the MDF analysis of moment structures SouthAfrican Statistical Journal 21 39ndash62

Steiger J H amp Lind J M (1980) Statistically based tests for the number of common factorsPaper presented at the annual meeting of the Psychometric Society Iowa City

Steiger J H Shapiro A amp Browne M W (1985) On the multivariate asymptotic distribution ofsequential chi-square statistics Psychometrika 50 253ndash264

Bootstrap approach to inference and power analysis 109

Tanaka Y Watadani S amp Moon S H (1991) Inuence in covariance structure analysis with anapplication to conrmatory factor analysis Communications in Statistics Theory andMethods 20 3805ndash3821

Tyler D E (1983) Robustness and efciency properties of scatter matrices Biometrika 70411ndash420

Wakaki H Eguchi S amp Fujikoshi Y (1990) A class of tests for a general covariance structureJournal of Multivariate Analysis 32 313ndash325

Yuan K-H amp Bentler P M (1997) Mean and covariance structure analysis Theoretical andpractical improvements Journal of the American Statistical Association 92 767ndash774

Yuan K-H amp Bentler P M (1998) Structural equation modeling with robust covariancesIn A E Raftery (ed) Sociological methodology 1998 (pp 363ndash396) Boston BlackwellPublishers

Yuan K-H amp Bentler P M (1999) On normal theory and associated test statistics in covariancestructure analysis under two classes of nonnormal distributions Statistica Sinica 9 831ndash853

Yuan K-H Chan W amp Bentler P M (2000) Robust transformation with applications tostructural equation modelling British Journal of Mathematical and Statistical Psychology53 31ndash50

Yuan K-H Bentler P M amp Chan W (2001) Structural equation modeling with heavy taileddistributions Manuscript submitted for publication

Yuan K-H amp Hayashi K (2001) On using an empirical Bayes covariance matrix in bootstrapapproach to covariance structure analysis Manuscript submitted for publication

Yung Y F amp Bentler P M (1994) Bootstrap-corrected ADF test statistics in covariance structureanalysis British Journal of Mathematical and Statistical Psychology 47 63ndash84

Yung Y F amp Bentler P M (1996) Bootstrapping techniques in analysis of mean and covariancestructures In G A Marcoulides and R E Schumacker (Eds) Advanced structural equationmodeling Techniques and issues (pp 195ndash226) Hillsdale NJ Lawrence Erlbaum

Zhang J amp Boos D D (1992) Bootstrap critical values for testing homogeneity of covariancematrices Journal of the American Statistical Association 87 425ndash429

Received 9 November 2000 revised version received 22 March 2002

110 Ke-Hai Yuan and Kentaro Hayashi

Page 7: Bootstrap approach to inference and power analysis based on three test statistics for covariance structure models

For a p-variate sample x 1 x n let

d i = d (x i m S) = [(x i plusmn m ) cent Splusmn 1(x i plusmn m )]12

be the Mahalanobis distance and u 1(t ) and u 2(t) be non-negative scalar functions Arobust M-estimator ( ˆm S) can be obtained by solving (Maronna 1976)

m =Xn

i = 1

u 1(di )x i

iquest Xn

i = 1

u1(di ) (13a)

S =Xn

i = 1

u 2(d 2i )(x i plusmn m )(x i plusmn m ) cent n (13b)

With decreasing functions u 1(t) and u 2(t) cases having greater di s receive smallerweights and thus their effects are controlled Let r represent the proportion ofoutlying cases one wants to control and r be a constant dened by P(x 2

p gt r 2) = rThe Huber-type weights corresponding to this r are

u 1(d ) =1 if d r

rd if d gt r

raquo(14)

and u 2(d 2) = fu 1(d )g 2J where J is a constant such that Efx 2p u 2(x 2

p)g = p whichmakes the estimate S unbiased if Fx = Np( m S) We may regard r as a tuning parameterwhich can be adjusted according to the tails of a specic data set Applying differenttypes of weights to several data sets Yuan Bentler and Chan (2001) found that the mostefcient parameter estimates often go with Huber-type weights

Let ( ˆm S) be the converged solution to (13) u2 i = u 2fd 2(x i ˆm S)g and

x (r)i = f

u 2 i

p(x i plusmn ˆm )g (15)

Then we can rewrite (13b) as

S =Xn

i = 1

x (r)i x (r)

i cent n

which is just the sample covariance matrix of the x (r)i Yuan et al (2000) proposed using

(15) as a downweighting transformation formula Working with several practical datasets they found that the transformed sample x (r)

i is much better approximated by anormal distribution than the original sample x i As we shall see when applying thebootstrap to a sample x (r)

i the test statistic TML is approximately pivotal For x (r)i we

have

E(x (r)i 1 x (r)

i 2 x (r)i 3 x (r)

i 4 ) = E[u 22(d 2

i )(x i 1 plusmn m1)(x i 2 plusmn m2)(x i 3 plusmn m 3)(x i 4 plusmn m4)] + o(1) (16)

where x (r)i j is the j th element of x (r)

i Because the denominator of u 2(d 2i ) in (14)

contains d 2i when d 2

i gt r 2 (16) is bounded Even when the fourth-order moments of Fx

do not exist the corresponding fourth-order moments of the x (r)i are still nite

When interest lies in the structure of the population covariance matrix S = Cov(x i)one needs to know whether transformation (15) changes the structure of S Let S (r) bethe population covariance matrix of x (r)

i then S (r) = kS when the sampling distribu-tion is elliptically symmetric For almost all the commonly used models in the socialsciences modelling S is equivalent to modelling S (r) (Browne 1984) Yuan et al (2000)further illustrated that outlying cases create signicant multivariate skewness and thatthe sample covariance matrix S x is biased in such a situation Analysis based on x (r)

i can

Bootstrap approach to inference and power analysis 99

successfully remove the bias in S x and recover the correct model structure More detailson the merits of robust approaches to CSAcan be found in Yuan and Bentler (1998) andYuan et al (2000)

5 ApplicationsOur purpose in this section is to compare the performance of TML TB and TSB using realdata We will show that by combining the bootstrap and a downweighting transforma-tion a nearly optimal procedure may be found for analysing a given data set If a teststatistic T is nearly pivotal then its evaluations Tb at the B0 bootstrap replicationsshould approximately follow a chi-square distribution There are a variety of tools toevaluate this property We favour the quantilendashquantile (QQ) plot because of itsvisualization value If T follows x 2

d the plot of ordered Tb s against the quantiles ofx 2

d should form an approximately straight line with slope2

p2 Asignicant departure

from this line indicates violations of the pivotal property To save space the QQplots arepresented in a document available on the Internet (httpwwwndedu kyuanpapersbsinfpowgpdf ) We choose B0 = B1 = B = 1000 in estimating ca and ba although a smaller B may be enough in obtaining these estimates (Efron amp Tibshirani1993)

Example 1 Holzinger and Swineford (1939) provide a data set consisting of 24cognitive variables on 145 subjects We use nine of the variables in our study visualperception cubes and lozenges measuring a spatial ability factor paragraph compre-hension sentence completion and word meaning measuring a verbal ability factoraddition counting dots and straight-curved capitals measuring (via the imposition of atime limit) a speed factor Let x represent the nine variables then the conrmatoryfactor model

x = L f + e Cov(x ) = LFL cent + W (17a)

with

L =

10 l 21 l 31 0 0 0 0 0 0

0 0 0 10 l 52 l 62 0 0 0

0 0 0 0 0 0 10 l 83 l 93

0

BB

1

CCA

cent

F =

f11 f12 f13

f 21 f22 f 23

f 31 f32 f 33

0

BB

1

CCA

(17b)

represents Holzinger and Swinefordrsquos hypothesis We assume that the measurementerrors are uncorrelated with W = Cov(e ) being a diagonal matrix There are q = 21unknown parameters and the model degrees of freedom are 24

Mardiarsquos (1970) multivariate kurtosis for the nine-variable model is 304 implyingthat the data may come from a distribution with heavier tails than those of a normaldistribution We therefore apply the downweighting transformation (15) in order forTML to be approximately pivotal Based on previous research (Yuan et al 20002001) we apply the three statistics to three samples the raw sample x i and thetransformed samples x (05)

i and x (25)i The following bootstrap procedures include

further transforming these samples to satisfy H0 or H1 as given in Section 2

100 Ke-Hai Yuan and Kentaro Hayashi

QQ plots for the three statistics on the raw data set suggest that no statistic isapproximately pivotal All have heavier right tails than that of x 2

24 This is expected forTML because the sample x i exhibits heavier tails than those of a normal distributionPrevious studies suggest that a large sample size is needed for TB to behave like achi-square random variable Asample size of n = 145 may not be large enough or theremay be other unknown factors that prevent TB from behaving like a chi-square variableEven though many simulation studies recommend TSB this statistic certainly has aheavier right tail than that of x 2

24 for this data setWhen applied to the transformed sample x (05)

i all the three statistics still haveheavier right tails than that of x 2

24 When applied to x (25)i the QQplots indicate that TML

is approximately described by x 224 but TB or TSB are not Consequently the suggested

procedure for this data set is to apply TML to the transformed sample x (25)i Boot-

strapping with TML on this sample not only leads to more efcient parameter estimates(Yuan et al 2001) but also provides a more accurate Type I error estimate

Table 1 gives the signicance levels of model (17) evaluated by various proceduresSeveral differences exist among these procedures First all the bootstrap p-values ( pB)

are greater than those ( px 2 ) obtained by referring the test statistics to x 224 The only

comparable pair of pB and px 2 is when TML is applied to the sample x (25)i Second the TB

statistic for each sample is the largest among the three statistics but the pB for TB is alsothe largest for any of the three samples This is in contrast to conclusions regarding theperformance of TB when referring to a chi-square distribution (Hu et al 1992 Fouladi2000 Yuan amp Bentler 1997) Third all the pBs based on any statistic with any of thesamples are quite comparable implying the robustness of bootstrap inference Theabove phenomena can be further explained by the critical values c 05 in Table 2 All thec 05 s are greater than x 2

24(95)plusmn 1 = 36415 the 95th percentile of x 224 When the heavier

tails are downweighted in the sample the T s corresponding to each of the threestatistics behave more like x 2

24 and the corresponding c 05 is nearer 36415 Although TBis the largest among the three statistics on anyof the samples the corresponding c 05 forTB is also the largest explaining whythe associated pBs for TB can also be the largest Onthe other hand the traditional inference procedure is to refer TB to x 2

24 for signicanceWith such a xed reference distribution that does not take the increased variabilityof TBinto account a larger TB generally corresponds to a smaller px 2

The power properties of TML TB and TSB are evaluated at two alternatives Startingwith model (17) the parameter l 91 is the only signicant path identied by theLagrange multiplier test in EQS (Bentler 1995) Adding this extra parameter to (17b)the rst alternative Sa1 is the estimated covariance matrix obtained by tting this model

Bootstrap approach to inference and power analysis 101

Table 1 Statistics bootstrap p-values ( pB) and p-values ( px 2 ) referring to x 224

Sample x i Sample x (05)i Sample x (25)

i

Statistic pB px 2 pB p x 2 pB p x 2

TML 51187 017 001 49408 010 002 48883 004 002TSB 48696 020 002 50318 013 001 53674 003 000TB 56726 055 000 62095 024 000 66910 014 000

to the raw sample x i by minimizing DML The second alternative Sa 2 is the samplecovariance matrix of the raw sample With these two alternatives the NCPs in (9) for thethree statistics are given in Table 3 Notice that once a Sa is given the d in (9)corresponding to TML is xed and does not depend on the sample Because both TSB andTB involve fourth-order sample moments their corresponding ds change as the tails ofthe data set change QQplots for the ordered T s against quantiles of the x 2

24(d) indicatethat the distributions of TML and TSB applied to x i are well approximated by thecorresponding non-central chi-squares Because TML under H0 applied to x (25)

i is wellapproximated by x 2

24 we would expect its distributions under the two alternatives alsoto be well approximated by x 2

24(2260) and x 224(5119) respectively This expectation is

not fullled as judged from the corresponding QQplots The reason for this is that evenfor perfectly normal data simulated from Np( m S) the statistic TML will not behave like anon-central chi-square unless d is quite small and n is large Because the NCP tends to beoverestimated when it is large (Satorra Saris amp de Pijper 1991) the corresponding QQplots are below the x = y line

Can we trust the results of power analysis in the traditional approach when TML andTSB are well approximated by non-central chi-square distributions in this example Theanswer is no This is because the critical value x 2

24(95)plusmn 1 = 36415 which does not takethe actual variabilityof TMLor TSB into account is not a good estimate of ca in (3) As canbe seen from Table 2 there exist quite substantial differences between x 2

24(95)plusmn 1 andthe c05 s Actually power analysis based on a non-central chi-square table is not reliablefor even perfectly normal data when S(v) is not near S a Table 4 contrasts the powerestimates based on the bootstrap (ba ) with those based on the traditional approach (ba)

using NCPs in Table 3 The ba s are uniformly smaller than the ba s especially when theNCPs are not huge When an NCP is large enough there is almost no overlap betweenthe distribution of (T | H0) and that of (T | H1) Any sensible inference procedure can tellthe difference between the two and consequently the power estimates under Ha2 inTable 4 are approximately the same

102 Ke-Hai Yuan and Kentaro Hayashi

Table 2 Bootstrap critical values c 05

Statistic Sample x i Sample x (05)i Sample x (25)

i

TML 42348 39991 37022TSB 41694 41067 40587TB 57930 55120 53925

Table 3 Non-centrality parameter estimates associated with each statistic and sample

Sample x i Sample x (05)i Sample x (25)

i

Statistic Ha1 Ha2 Ha1 Ha2 Ha1 Ha2

TML 22603 51187 22603 51187 22603 51187TSB 21763 48695 23107 51901 24875 55948TB 20289 56726 20130 63187 20683 69516

Table 4 illustrates the power properties of each statistic when the data changeWhen the outlying cases are downweighted not only do the parameter estimatesbecome more efcient (Yuan et al 2001) but also the powers for identifying anincorrect model are higher for all the statistics The powers of TML and TSB arequite comparable while the power of TB is much lower This is in contrast to theconclusion for power analysis based on the non-central chi-square where TB hasthe highest power in detecting a wrong model (Fouladi 2000 Yuan amp Bentler1997)

Next we focus on testing model (17) for close t We will only present results of thebootstrap using TML The use of TSB or TB is essentially the same MacCallum et al(1996) recommended testing close t (RMSEA 05) fair t (05 lt RMSEA 08)mediocre t (08 RMSEA 10) and poor t (10 lt RMSEA) We will conduct abootstrap procedure for these tests With (10) it is easy to see thatRMSEA = [minv D(S c S(v))( p plusmn q)]12 = e corresponds to dc = n ( p plusmn q)e2 Thed c s corresponding to e = 05 08 and 10 are given in the second row of Table 5The 95th percentiles of the corresponding x 2

24(dc ) are in the third row of the tableThese are respectively xed critical values for testing close t fair t and mediocre tin the approach of MacCallum et al (1996)

Bootstrap approach to inference and power analysis 103

Table 4 Bootstrap power estimates (ba ) and power estimates (ba) referring to x 24(d)

Sample x i Sample x (05)i Sample x (25)

i

Statistic Ha1 Ha2 Ha1 Ha2 Ha1 Ha2

TML ba 627 993 661 997 715 998ba 803 998 803 998 803 998

TSB ba 613 989 661 997 717 998ba 782 997 814 998 851 999

TB ba 374 983 454 991 501 995ba 743 999 739 1000 754 1000

Table 5 Test for close t

RMSEA 05 08 10dc 87 22272 3480

Sample TML x 224(dc 95)plusmn 1 48919 66926 82764

x i 51187 ca 53013 69928 84287p x 2 033 315 695pB 073 359 695

x (05)i 49408 ca 49548 55595 59620

p x 2 046 368 743pB 053 107 163

x (25)i 48883 ca 46305 51741 55899

p x 2 050 384 756pB 034 080 131

Let S x be the sample covariance matrix of x i and v be the corresponding maximumlikelihood estimate of v The values of h in the solution for Sc in the form ofS

h= (1 plusmn h)S(v) + hS x corresponding to e = 05 08 and 10 are respectively

0421 61 0669 81 and 0830 83 Applying the bootstrap procedure outlined in Section24 to the samples x i x (05)

i and x (25)i the estimates ca and p-values for each of the

samples are reported in Table 5 The results indicate that there are substantialdifferences between the traditional p-values px 2 and the bootstrap p-values pBespecially when RMSEA = 10 for the downweighted samples It is obvious that TMLwhen RMSEA = 10 has a much shorter right tail than that of x 2

24(dc) This fact is alsoreected in the corresponding ca s Table 5 once again illustrates the fact that given aclose t the behaviour of TML cannot be described by a non-central chi-square unlessthe corresponding TML under H0 in (1) has a heavy right tail As when testing for exactt using a chi-square table to judge the signicance of a statistic in testing model (17) forclose t is misleading

Example 2 Neumann (1994) presented an alcohol and psychological symptomdata set consisting of 10 variables and 335 cases The two variables in x = (x 1 x 2) cent arerespectively family history of psychopathology and family history of alcoholism whichare indicators for a latent construct of family history The eight variables iny = (y1 y8) cent are respectively the age of rst problem with alcohol age of rstdetoxication from alcohol alcohol severity score alcohol use inventory SCL-90psychological inventory the sum of the Minnesota Multiphasic Personality Inventoryscores the lowest level of psychosocial functioning during the past year and the highestlevel of psychosocial functioning during the past year With two indicators for eachlatent construct these eight variables respectively measure age of onset alcoholsymptoms psychopathology symptoms and global functioning Neumannrsquos (1994)theoretical model for this data set is

x = Lx y + d y = L y h + e (18a)

h = B h + Gy + z (18b)

where

Lx =10

l 1

Aacute Ly =

1 l 2 0 0 0 0 0 0

0 0 1 l 3 0 0 0 0

0 0 0 0 1 l 4 0 0

0 0 0 0 0 0 1 l 5

0

BBBBB

1

CCCCCA

cent

(18c)

B =

0 0 0 0b 21 0 0 0

b 31 b 32 0 00 b42 b43 0

0

BBB

1

CCCA G =

g11

0

00

0

BBB

1

CCCA F = Var(y) (18d )

and e d and z are vectors of errors whose elements are all uncorrelated The modeldegrees of freedom are 29

With Mardiarsquos multivariate kurtosis equal to 1476 the data set may come from adistribution with heavy tails Abootstrap procedure is more appropriate after the heavytails are properly downweighted Actually Yuan et al (2001) found that the samplex (10)

i leads to the most efcient parameter estimates in (18) among various proceduresHere we apply the three statistics to the two samples x i and x (10)

i Our purpose is to

104 Ke-Hai Yuan and Kentaro Hayashi

explore the pivotal property of TML TSB and TB on each sample After noticing that alarge portion of the QQ plot for TML applying to x (10)

i is below the x = y line ouranalysis also includes the sample x (05)

i The QQplots of the three statistics applied to x i indicate that none of them is nearly

pivotal TML and TSB are for the most part above the x = y line however both their lefttails are slightlybelow the x = y line This implies that some bootstrap samples t model(18) extremely well a phenomenon which can be caused by too many data points nearthe centre of the distribution Adownweighting procedure only affects data points thatcause a test statistic to possess a heavy right tail It is not clear to us how to deal with asample when a test statistic has a light left tail The QQplots of TML and TSB on x (05)

i andx (10)

i suggest that their heavier right tails on x i are under control However the right tailof TB is still quite heavy when compared to x 2

29 The downweighting transformationH(10) not only just controls the right tail of TML but also makes most of the QQplot fallbelow the x = y line Visually inspecting the QQ plots of TML on x i x (05)

i and x (10)i

suggests that a downweighting transformation by H(r) with 0 lt r lt 05 may lead toa better procedure for analysing the alcohol and psychological symptom data setbased on TML Since any procedure is only an approximation to the real world andTML is nearly pivotal when applying to x (05)

i we recommend the analysis using TMLon x (05)

i for this data set This leads to TML = 4243 with a bootstrap p-value of 040implying that the model (18) marginally ts the alcohol and psychological symptomdata set

Both the raw data sets in Examples 1 and 2 have signicant multivariate kurtoses andthe downweighting transformation (15) achieves approximately pivotal behaviour ofthe statistics We maywonder how these test statistics behave when applied to a data setthat does not have a signicant multivariate kurtosis This is demonstrated in thefollowing example

Example 3 Table 121 of Mardia Kent and Bibby (1979) contains test scores ofn = 88 students on p = 5 topics mechanics vectors algebra analysis and statisticsThe rst two topics were tested with closed-book exams and the last three withopen-book exams Since these two examination methods may tap different abilities atwo-factor model as in (17a) with

L =10 l 21 0 0 00 0 10 l 42 l 52

sup3 acutecent F =

f11 f12

f 21 f 22

sup3 acute(19)

was proposed and conrmed by Tanaka Watadani and Moon (1991) Mardiarsquos multi-variate kurtosis for this data set equal to 0057 is not signicant It would be interestingto see how TML TB and TSB perform on this data set

QQ plots for the three statistics applied to x i indicate that all of them have heavierright tails than that of x 2

4 The QQ plot for TML applied to x (r)i continues to exhibit a

heavier right tail until r reaches 30 TSB and TB still possess quite heavy right tails evenwhen r = 30

This data set has been used as an example for inuence analysis Previous studiesindicate that case number 81 is the most inuential point (Lee amp Wang 1996 Fung ampKwan 1995) So it is enticing to apply the three statistics to the x i s without the 81stcase The QQplots of TML TSB and TB applied to the remaining 87 cases indicate that allthe three statistics still have heavier right tails than that of x 2

4 Actually compared tothose based on all 88 cases the right tails of the three statistics on the 87 cases are evenheavier For this data set our recommended analysis is to use TML applied to x (30)

i With

Bootstrap approach to inference and power analysis 105

TML = 189 and a bootstrap p-value of 71 model (19) is more than good enough inexplaining the relationship of the ve variables

6 Non-convergence with Bootstrap ReplicationsNon-convergence issues exist with resampling and caution is needed for bootstrapinference (Ichikawa amp Konishi 1995) This generally happens when the sample size isnot large enough and especially when a model structure is wrong This issue wasdiscussed in Yuan and Bentler (1997) with Monte Carlo studies on a covariancestructure model Here we propose a reasonable way of handling non-convergencewith bootstrap replications

With an iterative algorithm like Newtonrsquos convergence is generally dened ask Dv ( j ) k lt e where e is a small number and Dv ( j ) = v ( j ) plusmn v ( j plusmn 1) with v ( j ) being thej th step solution Let s = vech(S) be the factor formed by stacking the non-duplicatedelements of S and let its sample counterpart be s = vech(S) Then Dv ( j + 1) =

( Ccedils centj Wj Ccedilsj)plusmn 1 Ccedils centj Wj (s plusmn sj) where Ccedilsj = [ds(v)dv | v ( j )] sj = s(v ( j )) and Wj is the

corresponding weight matrix evaluated at v ( j ) So Dv ( j ) is proportional toS plusmn S(v ( j plusmn 1)) and it is impossible for k Dv ( j ) k to be smaller than e if S(v) is far fromE(S) Although a model is correct in a bootstrap population F0 some bootstrapreplications may still lie far away from the model especially when sampling errors arelarge for small samples For these samples even if we can obtain a solution using somenon-iterative or direct search method the corresponding statistic will be signicantBased on this fact we should distinguish two kinds of non-convergence in bootstrap orgeneral Monte Carlo studies The rst is where a sample contains enough distinct pointsand still cannot reach convergence with a model which should be treated as asignicant replication or a lsquobad modelrsquo The second is where a sample does not containenough distinct observations to t a model For obtaining a TML this number is p + 1 thenumber is p + 1 for obtaining a TB Although a TSB can be obtained once a TML isavailable we need to have a positive denite sample covariance matrix S y ofy i = vech[(x plusmn x )(x plusmn x ) cent ] in order for TSB to make sense (Bentler amp Yuan 1999)and the minimum number of distinct data points for S y to be positive denite is p + 1We should treat the second case as a bad sample and ignore it in bootstrap replications

In practice instead of estimating ca one generally reports the p-value of a statistic TIf all B bootstrap replications result in convergent solutions

pB =B0 + 1 plusmn M

B0 + 1

where M = fTb | Tb gt Tg When non-convergences occur B0 should be dened as thenumber of converged samples plus the number of signicant samples due to a badmodel M should be dened as the number of non-converged samples due to the badmodel plus the number of converged samples that result in Tb gt T With this modi-cation there is no problem with hypothesis testing A similar modication applies toformula (8) for power evaluation With ca = T(B0 (1 plusmn a)) in determining the numeratorin (8) one needs at least B0(1 plusmn a) converged samples when sampling from F0 The proposed procedure fails to generate a power estimate when the number ofconvergences under H0 is below B0(1 plusmn a)

Let l 1(S x ) $ l 2(S x ) $ $ l p(S x ) be the eigenvalues of the sample covariancematrix S x and g1(S y ) $ g 2(S y ) $ $ gp (S y ) be the eigenvalues of the sample

106 Ke-Hai Yuan and Kentaro Hayashi

covariance matrix S y of y i = vech[(x plusmn x )(x plusmn x ) cent ] In computing the three examplesour criteria for bad samples are l p(S x )l 1(S x ) 10plusmn 20 for obtaining TML andgp (S y )g1(S y ) 10plusmn 20 for obtaining TSB and TB If these criteria are satised and themodel still cannot reach convergence (k Dv k lt 10plusmn 5) in 100 iterations we treat thecorresponding statistic as innity All non-convergences with the three examples weredue to signicant replications these are reported in Table 6 Table 6(a) implies that for agiven sample the further a model is from H0 the more often non-convergences occurThe table also suggests that with a given model the heavier the tails of a data set themore often one obtains non-convergences

7 Discussion and conclusionsExisting conclusions regarding the three commonly used statistics TML TSB and TB arebased on asymptotics and Monte Carlo studies The properties inherent in either ofthese approaches may not be enjoyed by these statistics in practice because of either anite sample or an unknown sampling distribution By applying them to a specic data

Bootstrap approach to inference and power analysis 107

Table 6 Non-convergences due to signicant samples

(a) Example 1

Sample x i Sample x (05)i Sample x (25)

i

Statistic H0 Ha1 Ha2 H0 Ha1 Ha2 H0 Ha1 Ha2

TML 0 0 2 0 0 1 0 0 1TSB 0 0 2 0 0 1 0 0 1TB 0 10 21 0 10 27 0 7 20

(b) Example 2

Sample x i Sample x (05)i Sample x (10)

iStatistic H0 H0 H0

TML 9 1 1TSB 9 1 1TB 1 0 0

(c) Example 3

Sample x i Sample x (30)i Sample x i

(81st case removed)Statistic H0 H0 H0

TML 1 0 0TSB 1 0 0TB 0 0 0

set through resampling properties of the three statistics can be visually examined bymeans of QQplots When a data set possesses heavy tails TML will inherit these tails byhaving a heavy right tail Actually all three statistics need the sampling distribution tohave nite fourth-order moments With possible violation of this assumption bypractical data we propose applying a bootstrap procedure to a transformed samplex (r)

i through downweighting The combination of bootstrapping and downweightingnot only offers a theoretical justication for applying the bootstrap to a data set withheavy tails but also provides a quite exible tool for exploring the properties of each ofthe three statistics Even if inference is based on referring a statistic to a chi-squaredistribution one will obtain a more accurate model evaluation by applying TML to atransformed sample x (r)

i Asymptotics justify the three statistics from different perspectives and TB is

asymptotically distribution-free Previous Monte Carlo studies mainly support TSBHowever with proper downweighting TML is generally the one that is best describedby a chi-square distribution However the conclusion that TML is the best statistic forbootstrap inference has to be preliminary Future studies may nd TB or TSB moreappropriate for other data sets We recommend exploring the different procedures for agiven data set as illustrated in Section 5

AcknowledgementsThe authors would like to thank Professor Peter M Bentler and two anonymous referees whoseconstructive comments lead to an improved version of this paper Correspondence concerningthis article should be addressed to Ke-Hai Yuan (kyuanndedu)

ReferencesAmemiya Y amp Anderson T W (1990) Asymptotic chi-square tests for a large class of factor

analysis models Annals of Statistics 18 1453ndash1463Barndorff-Nielsen O E amp Cox D R (1984) Bartlett adjustments to the likelihood ratio statistic

and the distribution of the maximum likelihood estimator Journal of the Royal StatisticalSociety B 46 483ndash495

Bentler P M (1995) EQS Structural equations program manual Encino CA MultivariateSoftware

Bentler P M amp Yuan K-H (1999) Structural equation modeling with small samples Teststatistics Multivariate Behavioral Research 34 181ndash197

Beran R (1986) Simulated power functions Annals of Statistics 14 151ndash173Beran R (1988) Prepivoting test statistics Abootstrap view of asymptotic renements Journal

of the American Statistical Association 83 687ndash697Beran R amp Srivastava M S (1985) Bootstrap tests and condence regions for functions of a

covariance matrix Annals of Statistics 13 95ndash115Bollen K A amp Stine R (1993) Bootstrapping goodness of t measures in structural equation

models In K A BOllen and J S Long (Eds) Testing structural equation models(pp 111ndash135) Newbury Park CA Sage

Browne M W (1984) Asymptotic distribution-free methods for the analysis of covariancestructures British Journal of Mathematical and Statistical Psychology 37 62ndash83

Browne M W amp Shapiro A (1988) Robustness of normal theory methods in the analysis oflinear latent variate models British Journal of Mathematical and Statistical Psychology 41193ndash208

108 Ke-Hai Yuan and Kentaro Hayashi

Cohen J (1988) Statistical power analysis for the behavioral sciences (2nd ed) Hillsdale NJErlbaum

Davison A C amp Hinkley D V (1997) Bootstrap methods and their application CambridgeCambridge University Press

Efron B ampTibshirani R J (1993) An introduction to the bootstrap New York Chapman ampHallFouladi R T (1998) Covariance structure analysis techniques under conditions of multi-

variate normality and nonnormality Modied and bootstrap based test statistics Paperpresented at the annual meeting of the American Educational Research Association San DiegoCA

Fouladi R T (2000) Performance of modied test statistics in covariance and correlationstructure analysis under conditions of multivariate nonnormality Structural EquationModeling 7 356ndash410

Fung W K ampKwan C W (1995) Sensitivityanalysis in factor analysis Difference between usingcovariance and correlation matrices Psychometrika 60 607ndash614

Hall P (1992) The bootstrap and Edgeworth expansion New York Springer-VerlagHolzinger K J amp Swineford F (1939) A study in factor analysis The stability of a bi-factor

solution University of Chicago Press Supplementary Educational Monographs No 48Hu L T Bentler P M amp Kano Y (1992) Can test statistics in covariance structure analysis be

trusted Psychological Bulletin 112 351ndash362Huber P J (1981) Robust statistics New York WileyIchikawa M amp Konishi S (1995) Application of the bootstrap methods in factor analysis

Psychometrika 60 77ndash93Lawley D N amp Maxwell A E (1971) Factor analysis as a statistical method (2nd ed) New

York American ElsevierLee S Y ampWang S J (1996) Sensitivityanalysis of structural equation models Psychometrika

61 93ndash108MacCallum R C Browne M W ampSugawara H M (1996) Power analysis and determination of

sample size for covariance structure modeling Psychological Methods 1 130ndash149Mardia K V (1970) Measure of multivariate skewness and kurtosis with applications

Biometrika 57 519ndash530Mardia K V Kent J T amp Bibby J M (1979) Multivariate analysis New York Academic PressMaronna R A (1976) Robust M-estimators of multivariate location and scatter Annals of

Statistics 4 51ndash67Mooijaart A ampBentler P M (1991) Robustness of normal theory statistics in structural equation

models Statistica Neerlandica 45 159ndash171Neumann C S (1994) Structural equation modeling of symptoms of alcoholism and

psychopathology Doctoral dissertation University of Kansas LawrenceSatorra A amp Bentler P M (1988) Scaling corrections for chi-square statistics in covariance

structure analysis In 1988 Proceedings of Business and Economics Sections (pp 308ndash313)Alexandria VA American Statistical Association

Satorra A amp Bentler P M (1990) Model conditions for asymptotic robustness in the analysis oflinear relations Computational Statistics amp Data Analysis 10 235ndash249

Satorra A ampSaris W E (1985) Power of the likelihood ratio test in covariance structure analysisPsychometrika 50 83ndash90

Satorra A Saris W E amp de Pijper W M (1991) Acomparison of several approximations to thepower function of the likelihood ratio test in covariance structure analysis StatisticaNeerlandica 45 173ndash185

Shapiro A (1987) Robustness properties of the MDF analysis of moment structures SouthAfrican Statistical Journal 21 39ndash62

Steiger J H amp Lind J M (1980) Statistically based tests for the number of common factorsPaper presented at the annual meeting of the Psychometric Society Iowa City

Steiger J H Shapiro A amp Browne M W (1985) On the multivariate asymptotic distribution ofsequential chi-square statistics Psychometrika 50 253ndash264

Bootstrap approach to inference and power analysis 109

Tanaka Y Watadani S amp Moon S H (1991) Inuence in covariance structure analysis with anapplication to conrmatory factor analysis Communications in Statistics Theory andMethods 20 3805ndash3821

Tyler D E (1983) Robustness and efciency properties of scatter matrices Biometrika 70411ndash420

Wakaki H Eguchi S amp Fujikoshi Y (1990) A class of tests for a general covariance structureJournal of Multivariate Analysis 32 313ndash325

Yuan K-H amp Bentler P M (1997) Mean and covariance structure analysis Theoretical andpractical improvements Journal of the American Statistical Association 92 767ndash774

Yuan K-H amp Bentler P M (1998) Structural equation modeling with robust covariancesIn A E Raftery (ed) Sociological methodology 1998 (pp 363ndash396) Boston BlackwellPublishers

Yuan K-H amp Bentler P M (1999) On normal theory and associated test statistics in covariancestructure analysis under two classes of nonnormal distributions Statistica Sinica 9 831ndash853

Yuan K-H Chan W amp Bentler P M (2000) Robust transformation with applications tostructural equation modelling British Journal of Mathematical and Statistical Psychology53 31ndash50

Yuan K-H Bentler P M amp Chan W (2001) Structural equation modeling with heavy taileddistributions Manuscript submitted for publication

Yuan K-H amp Hayashi K (2001) On using an empirical Bayes covariance matrix in bootstrapapproach to covariance structure analysis Manuscript submitted for publication

Yung Y F amp Bentler P M (1994) Bootstrap-corrected ADF test statistics in covariance structureanalysis British Journal of Mathematical and Statistical Psychology 47 63ndash84

Yung Y F amp Bentler P M (1996) Bootstrapping techniques in analysis of mean and covariancestructures In G A Marcoulides and R E Schumacker (Eds) Advanced structural equationmodeling Techniques and issues (pp 195ndash226) Hillsdale NJ Lawrence Erlbaum

Zhang J amp Boos D D (1992) Bootstrap critical values for testing homogeneity of covariancematrices Journal of the American Statistical Association 87 425ndash429

Received 9 November 2000 revised version received 22 March 2002

110 Ke-Hai Yuan and Kentaro Hayashi

Page 8: Bootstrap approach to inference and power analysis based on three test statistics for covariance structure models

successfully remove the bias in S x and recover the correct model structure More detailson the merits of robust approaches to CSAcan be found in Yuan and Bentler (1998) andYuan et al (2000)

5 ApplicationsOur purpose in this section is to compare the performance of TML TB and TSB using realdata We will show that by combining the bootstrap and a downweighting transforma-tion a nearly optimal procedure may be found for analysing a given data set If a teststatistic T is nearly pivotal then its evaluations Tb at the B0 bootstrap replicationsshould approximately follow a chi-square distribution There are a variety of tools toevaluate this property We favour the quantilendashquantile (QQ) plot because of itsvisualization value If T follows x 2

d the plot of ordered Tb s against the quantiles ofx 2

d should form an approximately straight line with slope2

p2 Asignicant departure

from this line indicates violations of the pivotal property To save space the QQplots arepresented in a document available on the Internet (httpwwwndedu kyuanpapersbsinfpowgpdf ) We choose B0 = B1 = B = 1000 in estimating ca and ba although a smaller B may be enough in obtaining these estimates (Efron amp Tibshirani1993)

Example 1 Holzinger and Swineford (1939) provide a data set consisting of 24cognitive variables on 145 subjects We use nine of the variables in our study visualperception cubes and lozenges measuring a spatial ability factor paragraph compre-hension sentence completion and word meaning measuring a verbal ability factoraddition counting dots and straight-curved capitals measuring (via the imposition of atime limit) a speed factor Let x represent the nine variables then the conrmatoryfactor model

x = L f + e Cov(x ) = LFL cent + W (17a)

with

L =

10 l 21 l 31 0 0 0 0 0 0

0 0 0 10 l 52 l 62 0 0 0

0 0 0 0 0 0 10 l 83 l 93

0

BB

1

CCA

cent

F =

f11 f12 f13

f 21 f22 f 23

f 31 f32 f 33

0

BB

1

CCA

(17b)

represents Holzinger and Swinefordrsquos hypothesis We assume that the measurementerrors are uncorrelated with W = Cov(e ) being a diagonal matrix There are q = 21unknown parameters and the model degrees of freedom are 24

Mardiarsquos (1970) multivariate kurtosis for the nine-variable model is 304 implyingthat the data may come from a distribution with heavier tails than those of a normaldistribution We therefore apply the downweighting transformation (15) in order forTML to be approximately pivotal Based on previous research (Yuan et al 20002001) we apply the three statistics to three samples the raw sample x i and thetransformed samples x (05)

i and x (25)i The following bootstrap procedures include

further transforming these samples to satisfy H0 or H1 as given in Section 2

100 Ke-Hai Yuan and Kentaro Hayashi

QQ plots for the three statistics on the raw data set suggest that no statistic isapproximately pivotal All have heavier right tails than that of x 2

24 This is expected forTML because the sample x i exhibits heavier tails than those of a normal distributionPrevious studies suggest that a large sample size is needed for TB to behave like achi-square random variable Asample size of n = 145 may not be large enough or theremay be other unknown factors that prevent TB from behaving like a chi-square variableEven though many simulation studies recommend TSB this statistic certainly has aheavier right tail than that of x 2

24 for this data setWhen applied to the transformed sample x (05)

i all the three statistics still haveheavier right tails than that of x 2

24 When applied to x (25)i the QQplots indicate that TML

is approximately described by x 224 but TB or TSB are not Consequently the suggested

procedure for this data set is to apply TML to the transformed sample x (25)i Boot-

strapping with TML on this sample not only leads to more efcient parameter estimates(Yuan et al 2001) but also provides a more accurate Type I error estimate

Table 1 gives the signicance levels of model (17) evaluated by various proceduresSeveral differences exist among these procedures First all the bootstrap p-values ( pB)

are greater than those ( px 2 ) obtained by referring the test statistics to x 224 The only

comparable pair of pB and px 2 is when TML is applied to the sample x (25)i Second the TB

statistic for each sample is the largest among the three statistics but the pB for TB is alsothe largest for any of the three samples This is in contrast to conclusions regarding theperformance of TB when referring to a chi-square distribution (Hu et al 1992 Fouladi2000 Yuan amp Bentler 1997) Third all the pBs based on any statistic with any of thesamples are quite comparable implying the robustness of bootstrap inference Theabove phenomena can be further explained by the critical values c 05 in Table 2 All thec 05 s are greater than x 2

24(95)plusmn 1 = 36415 the 95th percentile of x 224 When the heavier

tails are downweighted in the sample the T s corresponding to each of the threestatistics behave more like x 2

24 and the corresponding c 05 is nearer 36415 Although TBis the largest among the three statistics on anyof the samples the corresponding c 05 forTB is also the largest explaining whythe associated pBs for TB can also be the largest Onthe other hand the traditional inference procedure is to refer TB to x 2

24 for signicanceWith such a xed reference distribution that does not take the increased variabilityof TBinto account a larger TB generally corresponds to a smaller px 2

The power properties of TML TB and TSB are evaluated at two alternatives Startingwith model (17) the parameter l 91 is the only signicant path identied by theLagrange multiplier test in EQS (Bentler 1995) Adding this extra parameter to (17b)the rst alternative Sa1 is the estimated covariance matrix obtained by tting this model

Bootstrap approach to inference and power analysis 101

Table 1 Statistics bootstrap p-values ( pB) and p-values ( px 2 ) referring to x 224

Sample x i Sample x (05)i Sample x (25)

i

Statistic pB px 2 pB p x 2 pB p x 2

TML 51187 017 001 49408 010 002 48883 004 002TSB 48696 020 002 50318 013 001 53674 003 000TB 56726 055 000 62095 024 000 66910 014 000

to the raw sample x i by minimizing DML The second alternative Sa 2 is the samplecovariance matrix of the raw sample With these two alternatives the NCPs in (9) for thethree statistics are given in Table 3 Notice that once a Sa is given the d in (9)corresponding to TML is xed and does not depend on the sample Because both TSB andTB involve fourth-order sample moments their corresponding ds change as the tails ofthe data set change QQplots for the ordered T s against quantiles of the x 2

24(d) indicatethat the distributions of TML and TSB applied to x i are well approximated by thecorresponding non-central chi-squares Because TML under H0 applied to x (25)

i is wellapproximated by x 2

24 we would expect its distributions under the two alternatives alsoto be well approximated by x 2

24(2260) and x 224(5119) respectively This expectation is

not fullled as judged from the corresponding QQplots The reason for this is that evenfor perfectly normal data simulated from Np( m S) the statistic TML will not behave like anon-central chi-square unless d is quite small and n is large Because the NCP tends to beoverestimated when it is large (Satorra Saris amp de Pijper 1991) the corresponding QQplots are below the x = y line

Can we trust the results of power analysis in the traditional approach when TML andTSB are well approximated by non-central chi-square distributions in this example Theanswer is no This is because the critical value x 2

24(95)plusmn 1 = 36415 which does not takethe actual variabilityof TMLor TSB into account is not a good estimate of ca in (3) As canbe seen from Table 2 there exist quite substantial differences between x 2

24(95)plusmn 1 andthe c05 s Actually power analysis based on a non-central chi-square table is not reliablefor even perfectly normal data when S(v) is not near S a Table 4 contrasts the powerestimates based on the bootstrap (ba ) with those based on the traditional approach (ba)

using NCPs in Table 3 The ba s are uniformly smaller than the ba s especially when theNCPs are not huge When an NCP is large enough there is almost no overlap betweenthe distribution of (T | H0) and that of (T | H1) Any sensible inference procedure can tellthe difference between the two and consequently the power estimates under Ha2 inTable 4 are approximately the same

102 Ke-Hai Yuan and Kentaro Hayashi

Table 2 Bootstrap critical values c 05

Statistic Sample x i Sample x (05)i Sample x (25)

i

TML 42348 39991 37022TSB 41694 41067 40587TB 57930 55120 53925

Table 3 Non-centrality parameter estimates associated with each statistic and sample

Sample x i Sample x (05)i Sample x (25)

i

Statistic Ha1 Ha2 Ha1 Ha2 Ha1 Ha2

TML 22603 51187 22603 51187 22603 51187TSB 21763 48695 23107 51901 24875 55948TB 20289 56726 20130 63187 20683 69516

Table 4 illustrates the power properties of each statistic when the data changeWhen the outlying cases are downweighted not only do the parameter estimatesbecome more efcient (Yuan et al 2001) but also the powers for identifying anincorrect model are higher for all the statistics The powers of TML and TSB arequite comparable while the power of TB is much lower This is in contrast to theconclusion for power analysis based on the non-central chi-square where TB hasthe highest power in detecting a wrong model (Fouladi 2000 Yuan amp Bentler1997)

Next we focus on testing model (17) for close t We will only present results of thebootstrap using TML The use of TSB or TB is essentially the same MacCallum et al(1996) recommended testing close t (RMSEA 05) fair t (05 lt RMSEA 08)mediocre t (08 RMSEA 10) and poor t (10 lt RMSEA) We will conduct abootstrap procedure for these tests With (10) it is easy to see thatRMSEA = [minv D(S c S(v))( p plusmn q)]12 = e corresponds to dc = n ( p plusmn q)e2 Thed c s corresponding to e = 05 08 and 10 are given in the second row of Table 5The 95th percentiles of the corresponding x 2

24(dc ) are in the third row of the tableThese are respectively xed critical values for testing close t fair t and mediocre tin the approach of MacCallum et al (1996)

Bootstrap approach to inference and power analysis 103

Table 4 Bootstrap power estimates (ba ) and power estimates (ba) referring to x 24(d)

Sample x i Sample x (05)i Sample x (25)

i

Statistic Ha1 Ha2 Ha1 Ha2 Ha1 Ha2

TML ba 627 993 661 997 715 998ba 803 998 803 998 803 998

TSB ba 613 989 661 997 717 998ba 782 997 814 998 851 999

TB ba 374 983 454 991 501 995ba 743 999 739 1000 754 1000

Table 5 Test for close t

RMSEA 05 08 10dc 87 22272 3480

Sample TML x 224(dc 95)plusmn 1 48919 66926 82764

x i 51187 ca 53013 69928 84287p x 2 033 315 695pB 073 359 695

x (05)i 49408 ca 49548 55595 59620

p x 2 046 368 743pB 053 107 163

x (25)i 48883 ca 46305 51741 55899

p x 2 050 384 756pB 034 080 131

Let S x be the sample covariance matrix of x i and v be the corresponding maximumlikelihood estimate of v The values of h in the solution for Sc in the form ofS

h= (1 plusmn h)S(v) + hS x corresponding to e = 05 08 and 10 are respectively

0421 61 0669 81 and 0830 83 Applying the bootstrap procedure outlined in Section24 to the samples x i x (05)

i and x (25)i the estimates ca and p-values for each of the

samples are reported in Table 5 The results indicate that there are substantialdifferences between the traditional p-values px 2 and the bootstrap p-values pBespecially when RMSEA = 10 for the downweighted samples It is obvious that TMLwhen RMSEA = 10 has a much shorter right tail than that of x 2

24(dc) This fact is alsoreected in the corresponding ca s Table 5 once again illustrates the fact that given aclose t the behaviour of TML cannot be described by a non-central chi-square unlessthe corresponding TML under H0 in (1) has a heavy right tail As when testing for exactt using a chi-square table to judge the signicance of a statistic in testing model (17) forclose t is misleading

Example 2 Neumann (1994) presented an alcohol and psychological symptomdata set consisting of 10 variables and 335 cases The two variables in x = (x 1 x 2) cent arerespectively family history of psychopathology and family history of alcoholism whichare indicators for a latent construct of family history The eight variables iny = (y1 y8) cent are respectively the age of rst problem with alcohol age of rstdetoxication from alcohol alcohol severity score alcohol use inventory SCL-90psychological inventory the sum of the Minnesota Multiphasic Personality Inventoryscores the lowest level of psychosocial functioning during the past year and the highestlevel of psychosocial functioning during the past year With two indicators for eachlatent construct these eight variables respectively measure age of onset alcoholsymptoms psychopathology symptoms and global functioning Neumannrsquos (1994)theoretical model for this data set is

x = Lx y + d y = L y h + e (18a)

h = B h + Gy + z (18b)

where

Lx =10

l 1

Aacute Ly =

1 l 2 0 0 0 0 0 0

0 0 1 l 3 0 0 0 0

0 0 0 0 1 l 4 0 0

0 0 0 0 0 0 1 l 5

0

BBBBB

1

CCCCCA

cent

(18c)

B =

0 0 0 0b 21 0 0 0

b 31 b 32 0 00 b42 b43 0

0

BBB

1

CCCA G =

g11

0

00

0

BBB

1

CCCA F = Var(y) (18d )

and e d and z are vectors of errors whose elements are all uncorrelated The modeldegrees of freedom are 29

With Mardiarsquos multivariate kurtosis equal to 1476 the data set may come from adistribution with heavy tails Abootstrap procedure is more appropriate after the heavytails are properly downweighted Actually Yuan et al (2001) found that the samplex (10)

i leads to the most efcient parameter estimates in (18) among various proceduresHere we apply the three statistics to the two samples x i and x (10)

i Our purpose is to

104 Ke-Hai Yuan and Kentaro Hayashi

explore the pivotal property of TML TSB and TB on each sample After noticing that alarge portion of the QQ plot for TML applying to x (10)

i is below the x = y line ouranalysis also includes the sample x (05)

i The QQplots of the three statistics applied to x i indicate that none of them is nearly

pivotal TML and TSB are for the most part above the x = y line however both their lefttails are slightlybelow the x = y line This implies that some bootstrap samples t model(18) extremely well a phenomenon which can be caused by too many data points nearthe centre of the distribution Adownweighting procedure only affects data points thatcause a test statistic to possess a heavy right tail It is not clear to us how to deal with asample when a test statistic has a light left tail The QQplots of TML and TSB on x (05)

i andx (10)

i suggest that their heavier right tails on x i are under control However the right tailof TB is still quite heavy when compared to x 2

29 The downweighting transformationH(10) not only just controls the right tail of TML but also makes most of the QQplot fallbelow the x = y line Visually inspecting the QQ plots of TML on x i x (05)

i and x (10)i

suggests that a downweighting transformation by H(r) with 0 lt r lt 05 may lead toa better procedure for analysing the alcohol and psychological symptom data setbased on TML Since any procedure is only an approximation to the real world andTML is nearly pivotal when applying to x (05)

i we recommend the analysis using TMLon x (05)

i for this data set This leads to TML = 4243 with a bootstrap p-value of 040implying that the model (18) marginally ts the alcohol and psychological symptomdata set

Both the raw data sets in Examples 1 and 2 have signicant multivariate kurtoses andthe downweighting transformation (15) achieves approximately pivotal behaviour ofthe statistics We maywonder how these test statistics behave when applied to a data setthat does not have a signicant multivariate kurtosis This is demonstrated in thefollowing example

Example 3 Table 121 of Mardia Kent and Bibby (1979) contains test scores ofn = 88 students on p = 5 topics mechanics vectors algebra analysis and statisticsThe rst two topics were tested with closed-book exams and the last three withopen-book exams Since these two examination methods may tap different abilities atwo-factor model as in (17a) with

L =10 l 21 0 0 00 0 10 l 42 l 52

sup3 acutecent F =

f11 f12

f 21 f 22

sup3 acute(19)

was proposed and conrmed by Tanaka Watadani and Moon (1991) Mardiarsquos multi-variate kurtosis for this data set equal to 0057 is not signicant It would be interestingto see how TML TB and TSB perform on this data set

QQ plots for the three statistics applied to x i indicate that all of them have heavierright tails than that of x 2

4 The QQ plot for TML applied to x (r)i continues to exhibit a

heavier right tail until r reaches 30 TSB and TB still possess quite heavy right tails evenwhen r = 30

This data set has been used as an example for inuence analysis Previous studiesindicate that case number 81 is the most inuential point (Lee amp Wang 1996 Fung ampKwan 1995) So it is enticing to apply the three statistics to the x i s without the 81stcase The QQplots of TML TSB and TB applied to the remaining 87 cases indicate that allthe three statistics still have heavier right tails than that of x 2

4 Actually compared tothose based on all 88 cases the right tails of the three statistics on the 87 cases are evenheavier For this data set our recommended analysis is to use TML applied to x (30)

i With

Bootstrap approach to inference and power analysis 105

TML = 189 and a bootstrap p-value of 71 model (19) is more than good enough inexplaining the relationship of the ve variables

6 Non-convergence with Bootstrap ReplicationsNon-convergence issues exist with resampling and caution is needed for bootstrapinference (Ichikawa amp Konishi 1995) This generally happens when the sample size isnot large enough and especially when a model structure is wrong This issue wasdiscussed in Yuan and Bentler (1997) with Monte Carlo studies on a covariancestructure model Here we propose a reasonable way of handling non-convergencewith bootstrap replications

With an iterative algorithm like Newtonrsquos convergence is generally dened ask Dv ( j ) k lt e where e is a small number and Dv ( j ) = v ( j ) plusmn v ( j plusmn 1) with v ( j ) being thej th step solution Let s = vech(S) be the factor formed by stacking the non-duplicatedelements of S and let its sample counterpart be s = vech(S) Then Dv ( j + 1) =

( Ccedils centj Wj Ccedilsj)plusmn 1 Ccedils centj Wj (s plusmn sj) where Ccedilsj = [ds(v)dv | v ( j )] sj = s(v ( j )) and Wj is the

corresponding weight matrix evaluated at v ( j ) So Dv ( j ) is proportional toS plusmn S(v ( j plusmn 1)) and it is impossible for k Dv ( j ) k to be smaller than e if S(v) is far fromE(S) Although a model is correct in a bootstrap population F0 some bootstrapreplications may still lie far away from the model especially when sampling errors arelarge for small samples For these samples even if we can obtain a solution using somenon-iterative or direct search method the corresponding statistic will be signicantBased on this fact we should distinguish two kinds of non-convergence in bootstrap orgeneral Monte Carlo studies The rst is where a sample contains enough distinct pointsand still cannot reach convergence with a model which should be treated as asignicant replication or a lsquobad modelrsquo The second is where a sample does not containenough distinct observations to t a model For obtaining a TML this number is p + 1 thenumber is p + 1 for obtaining a TB Although a TSB can be obtained once a TML isavailable we need to have a positive denite sample covariance matrix S y ofy i = vech[(x plusmn x )(x plusmn x ) cent ] in order for TSB to make sense (Bentler amp Yuan 1999)and the minimum number of distinct data points for S y to be positive denite is p + 1We should treat the second case as a bad sample and ignore it in bootstrap replications

In practice instead of estimating ca one generally reports the p-value of a statistic TIf all B bootstrap replications result in convergent solutions

pB =B0 + 1 plusmn M

B0 + 1

where M = fTb | Tb gt Tg When non-convergences occur B0 should be dened as thenumber of converged samples plus the number of signicant samples due to a badmodel M should be dened as the number of non-converged samples due to the badmodel plus the number of converged samples that result in Tb gt T With this modi-cation there is no problem with hypothesis testing A similar modication applies toformula (8) for power evaluation With ca = T(B0 (1 plusmn a)) in determining the numeratorin (8) one needs at least B0(1 plusmn a) converged samples when sampling from F0 The proposed procedure fails to generate a power estimate when the number ofconvergences under H0 is below B0(1 plusmn a)

Let l 1(S x ) $ l 2(S x ) $ $ l p(S x ) be the eigenvalues of the sample covariancematrix S x and g1(S y ) $ g 2(S y ) $ $ gp (S y ) be the eigenvalues of the sample

106 Ke-Hai Yuan and Kentaro Hayashi

covariance matrix S y of y i = vech[(x plusmn x )(x plusmn x ) cent ] In computing the three examplesour criteria for bad samples are l p(S x )l 1(S x ) 10plusmn 20 for obtaining TML andgp (S y )g1(S y ) 10plusmn 20 for obtaining TSB and TB If these criteria are satised and themodel still cannot reach convergence (k Dv k lt 10plusmn 5) in 100 iterations we treat thecorresponding statistic as innity All non-convergences with the three examples weredue to signicant replications these are reported in Table 6 Table 6(a) implies that for agiven sample the further a model is from H0 the more often non-convergences occurThe table also suggests that with a given model the heavier the tails of a data set themore often one obtains non-convergences

7 Discussion and conclusionsExisting conclusions regarding the three commonly used statistics TML TSB and TB arebased on asymptotics and Monte Carlo studies The properties inherent in either ofthese approaches may not be enjoyed by these statistics in practice because of either anite sample or an unknown sampling distribution By applying them to a specic data

Bootstrap approach to inference and power analysis 107

Table 6 Non-convergences due to signicant samples

(a) Example 1

Sample x i Sample x (05)i Sample x (25)

i

Statistic H0 Ha1 Ha2 H0 Ha1 Ha2 H0 Ha1 Ha2

TML 0 0 2 0 0 1 0 0 1TSB 0 0 2 0 0 1 0 0 1TB 0 10 21 0 10 27 0 7 20

(b) Example 2

Sample x i Sample x (05)i Sample x (10)

iStatistic H0 H0 H0

TML 9 1 1TSB 9 1 1TB 1 0 0

(c) Example 3

Sample x i Sample x (30)i Sample x i

(81st case removed)Statistic H0 H0 H0

TML 1 0 0TSB 1 0 0TB 0 0 0

set through resampling properties of the three statistics can be visually examined bymeans of QQplots When a data set possesses heavy tails TML will inherit these tails byhaving a heavy right tail Actually all three statistics need the sampling distribution tohave nite fourth-order moments With possible violation of this assumption bypractical data we propose applying a bootstrap procedure to a transformed samplex (r)

i through downweighting The combination of bootstrapping and downweightingnot only offers a theoretical justication for applying the bootstrap to a data set withheavy tails but also provides a quite exible tool for exploring the properties of each ofthe three statistics Even if inference is based on referring a statistic to a chi-squaredistribution one will obtain a more accurate model evaluation by applying TML to atransformed sample x (r)

i Asymptotics justify the three statistics from different perspectives and TB is

asymptotically distribution-free Previous Monte Carlo studies mainly support TSBHowever with proper downweighting TML is generally the one that is best describedby a chi-square distribution However the conclusion that TML is the best statistic forbootstrap inference has to be preliminary Future studies may nd TB or TSB moreappropriate for other data sets We recommend exploring the different procedures for agiven data set as illustrated in Section 5

AcknowledgementsThe authors would like to thank Professor Peter M Bentler and two anonymous referees whoseconstructive comments lead to an improved version of this paper Correspondence concerningthis article should be addressed to Ke-Hai Yuan (kyuanndedu)

ReferencesAmemiya Y amp Anderson T W (1990) Asymptotic chi-square tests for a large class of factor

analysis models Annals of Statistics 18 1453ndash1463Barndorff-Nielsen O E amp Cox D R (1984) Bartlett adjustments to the likelihood ratio statistic

and the distribution of the maximum likelihood estimator Journal of the Royal StatisticalSociety B 46 483ndash495

Bentler P M (1995) EQS Structural equations program manual Encino CA MultivariateSoftware

Bentler P M amp Yuan K-H (1999) Structural equation modeling with small samples Teststatistics Multivariate Behavioral Research 34 181ndash197

Beran R (1986) Simulated power functions Annals of Statistics 14 151ndash173Beran R (1988) Prepivoting test statistics Abootstrap view of asymptotic renements Journal

of the American Statistical Association 83 687ndash697Beran R amp Srivastava M S (1985) Bootstrap tests and condence regions for functions of a

covariance matrix Annals of Statistics 13 95ndash115Bollen K A amp Stine R (1993) Bootstrapping goodness of t measures in structural equation

models In K A BOllen and J S Long (Eds) Testing structural equation models(pp 111ndash135) Newbury Park CA Sage

Browne M W (1984) Asymptotic distribution-free methods for the analysis of covariancestructures British Journal of Mathematical and Statistical Psychology 37 62ndash83

Browne M W amp Shapiro A (1988) Robustness of normal theory methods in the analysis oflinear latent variate models British Journal of Mathematical and Statistical Psychology 41193ndash208

108 Ke-Hai Yuan and Kentaro Hayashi

Cohen J (1988) Statistical power analysis for the behavioral sciences (2nd ed) Hillsdale NJErlbaum

Davison A C amp Hinkley D V (1997) Bootstrap methods and their application CambridgeCambridge University Press

Efron B ampTibshirani R J (1993) An introduction to the bootstrap New York Chapman ampHallFouladi R T (1998) Covariance structure analysis techniques under conditions of multi-

variate normality and nonnormality Modied and bootstrap based test statistics Paperpresented at the annual meeting of the American Educational Research Association San DiegoCA

Fouladi R T (2000) Performance of modied test statistics in covariance and correlationstructure analysis under conditions of multivariate nonnormality Structural EquationModeling 7 356ndash410

Fung W K ampKwan C W (1995) Sensitivityanalysis in factor analysis Difference between usingcovariance and correlation matrices Psychometrika 60 607ndash614

Hall P (1992) The bootstrap and Edgeworth expansion New York Springer-VerlagHolzinger K J amp Swineford F (1939) A study in factor analysis The stability of a bi-factor

solution University of Chicago Press Supplementary Educational Monographs No 48Hu L T Bentler P M amp Kano Y (1992) Can test statistics in covariance structure analysis be

trusted Psychological Bulletin 112 351ndash362Huber P J (1981) Robust statistics New York WileyIchikawa M amp Konishi S (1995) Application of the bootstrap methods in factor analysis

Psychometrika 60 77ndash93Lawley D N amp Maxwell A E (1971) Factor analysis as a statistical method (2nd ed) New

York American ElsevierLee S Y ampWang S J (1996) Sensitivityanalysis of structural equation models Psychometrika

61 93ndash108MacCallum R C Browne M W ampSugawara H M (1996) Power analysis and determination of

sample size for covariance structure modeling Psychological Methods 1 130ndash149Mardia K V (1970) Measure of multivariate skewness and kurtosis with applications

Biometrika 57 519ndash530Mardia K V Kent J T amp Bibby J M (1979) Multivariate analysis New York Academic PressMaronna R A (1976) Robust M-estimators of multivariate location and scatter Annals of

Statistics 4 51ndash67Mooijaart A ampBentler P M (1991) Robustness of normal theory statistics in structural equation

models Statistica Neerlandica 45 159ndash171Neumann C S (1994) Structural equation modeling of symptoms of alcoholism and

psychopathology Doctoral dissertation University of Kansas LawrenceSatorra A amp Bentler P M (1988) Scaling corrections for chi-square statistics in covariance

structure analysis In 1988 Proceedings of Business and Economics Sections (pp 308ndash313)Alexandria VA American Statistical Association

Satorra A amp Bentler P M (1990) Model conditions for asymptotic robustness in the analysis oflinear relations Computational Statistics amp Data Analysis 10 235ndash249

Satorra A ampSaris W E (1985) Power of the likelihood ratio test in covariance structure analysisPsychometrika 50 83ndash90

Satorra A Saris W E amp de Pijper W M (1991) Acomparison of several approximations to thepower function of the likelihood ratio test in covariance structure analysis StatisticaNeerlandica 45 173ndash185

Shapiro A (1987) Robustness properties of the MDF analysis of moment structures SouthAfrican Statistical Journal 21 39ndash62

Steiger J H amp Lind J M (1980) Statistically based tests for the number of common factorsPaper presented at the annual meeting of the Psychometric Society Iowa City

Steiger J H Shapiro A amp Browne M W (1985) On the multivariate asymptotic distribution ofsequential chi-square statistics Psychometrika 50 253ndash264

Bootstrap approach to inference and power analysis 109

Tanaka Y Watadani S amp Moon S H (1991) Inuence in covariance structure analysis with anapplication to conrmatory factor analysis Communications in Statistics Theory andMethods 20 3805ndash3821

Tyler D E (1983) Robustness and efciency properties of scatter matrices Biometrika 70411ndash420

Wakaki H Eguchi S amp Fujikoshi Y (1990) A class of tests for a general covariance structureJournal of Multivariate Analysis 32 313ndash325

Yuan K-H amp Bentler P M (1997) Mean and covariance structure analysis Theoretical andpractical improvements Journal of the American Statistical Association 92 767ndash774

Yuan K-H amp Bentler P M (1998) Structural equation modeling with robust covariancesIn A E Raftery (ed) Sociological methodology 1998 (pp 363ndash396) Boston BlackwellPublishers

Yuan K-H amp Bentler P M (1999) On normal theory and associated test statistics in covariancestructure analysis under two classes of nonnormal distributions Statistica Sinica 9 831ndash853

Yuan K-H Chan W amp Bentler P M (2000) Robust transformation with applications tostructural equation modelling British Journal of Mathematical and Statistical Psychology53 31ndash50

Yuan K-H Bentler P M amp Chan W (2001) Structural equation modeling with heavy taileddistributions Manuscript submitted for publication

Yuan K-H amp Hayashi K (2001) On using an empirical Bayes covariance matrix in bootstrapapproach to covariance structure analysis Manuscript submitted for publication

Yung Y F amp Bentler P M (1994) Bootstrap-corrected ADF test statistics in covariance structureanalysis British Journal of Mathematical and Statistical Psychology 47 63ndash84

Yung Y F amp Bentler P M (1996) Bootstrapping techniques in analysis of mean and covariancestructures In G A Marcoulides and R E Schumacker (Eds) Advanced structural equationmodeling Techniques and issues (pp 195ndash226) Hillsdale NJ Lawrence Erlbaum

Zhang J amp Boos D D (1992) Bootstrap critical values for testing homogeneity of covariancematrices Journal of the American Statistical Association 87 425ndash429

Received 9 November 2000 revised version received 22 March 2002

110 Ke-Hai Yuan and Kentaro Hayashi

Page 9: Bootstrap approach to inference and power analysis based on three test statistics for covariance structure models

QQ plots for the three statistics on the raw data set suggest that no statistic isapproximately pivotal All have heavier right tails than that of x 2

24 This is expected forTML because the sample x i exhibits heavier tails than those of a normal distributionPrevious studies suggest that a large sample size is needed for TB to behave like achi-square random variable Asample size of n = 145 may not be large enough or theremay be other unknown factors that prevent TB from behaving like a chi-square variableEven though many simulation studies recommend TSB this statistic certainly has aheavier right tail than that of x 2

24 for this data setWhen applied to the transformed sample x (05)

i all the three statistics still haveheavier right tails than that of x 2

24 When applied to x (25)i the QQplots indicate that TML

is approximately described by x 224 but TB or TSB are not Consequently the suggested

procedure for this data set is to apply TML to the transformed sample x (25)i Boot-

strapping with TML on this sample not only leads to more efcient parameter estimates(Yuan et al 2001) but also provides a more accurate Type I error estimate

Table 1 gives the signicance levels of model (17) evaluated by various proceduresSeveral differences exist among these procedures First all the bootstrap p-values ( pB)

are greater than those ( px 2 ) obtained by referring the test statistics to x 224 The only

comparable pair of pB and px 2 is when TML is applied to the sample x (25)i Second the TB

statistic for each sample is the largest among the three statistics but the pB for TB is alsothe largest for any of the three samples This is in contrast to conclusions regarding theperformance of TB when referring to a chi-square distribution (Hu et al 1992 Fouladi2000 Yuan amp Bentler 1997) Third all the pBs based on any statistic with any of thesamples are quite comparable implying the robustness of bootstrap inference Theabove phenomena can be further explained by the critical values c 05 in Table 2 All thec 05 s are greater than x 2

24(95)plusmn 1 = 36415 the 95th percentile of x 224 When the heavier

tails are downweighted in the sample the T s corresponding to each of the threestatistics behave more like x 2

24 and the corresponding c 05 is nearer 36415 Although TBis the largest among the three statistics on anyof the samples the corresponding c 05 forTB is also the largest explaining whythe associated pBs for TB can also be the largest Onthe other hand the traditional inference procedure is to refer TB to x 2

24 for signicanceWith such a xed reference distribution that does not take the increased variabilityof TBinto account a larger TB generally corresponds to a smaller px 2

The power properties of TML TB and TSB are evaluated at two alternatives Startingwith model (17) the parameter l 91 is the only signicant path identied by theLagrange multiplier test in EQS (Bentler 1995) Adding this extra parameter to (17b)the rst alternative Sa1 is the estimated covariance matrix obtained by tting this model

Bootstrap approach to inference and power analysis 101

Table 1 Statistics bootstrap p-values ( pB) and p-values ( px 2 ) referring to x 224

Sample x i Sample x (05)i Sample x (25)

i

Statistic pB px 2 pB p x 2 pB p x 2

TML 51187 017 001 49408 010 002 48883 004 002TSB 48696 020 002 50318 013 001 53674 003 000TB 56726 055 000 62095 024 000 66910 014 000

to the raw sample x i by minimizing DML The second alternative Sa 2 is the samplecovariance matrix of the raw sample With these two alternatives the NCPs in (9) for thethree statistics are given in Table 3 Notice that once a Sa is given the d in (9)corresponding to TML is xed and does not depend on the sample Because both TSB andTB involve fourth-order sample moments their corresponding ds change as the tails ofthe data set change QQplots for the ordered T s against quantiles of the x 2

24(d) indicatethat the distributions of TML and TSB applied to x i are well approximated by thecorresponding non-central chi-squares Because TML under H0 applied to x (25)

i is wellapproximated by x 2

24 we would expect its distributions under the two alternatives alsoto be well approximated by x 2

24(2260) and x 224(5119) respectively This expectation is

not fullled as judged from the corresponding QQplots The reason for this is that evenfor perfectly normal data simulated from Np( m S) the statistic TML will not behave like anon-central chi-square unless d is quite small and n is large Because the NCP tends to beoverestimated when it is large (Satorra Saris amp de Pijper 1991) the corresponding QQplots are below the x = y line

Can we trust the results of power analysis in the traditional approach when TML andTSB are well approximated by non-central chi-square distributions in this example Theanswer is no This is because the critical value x 2

24(95)plusmn 1 = 36415 which does not takethe actual variabilityof TMLor TSB into account is not a good estimate of ca in (3) As canbe seen from Table 2 there exist quite substantial differences between x 2

24(95)plusmn 1 andthe c05 s Actually power analysis based on a non-central chi-square table is not reliablefor even perfectly normal data when S(v) is not near S a Table 4 contrasts the powerestimates based on the bootstrap (ba ) with those based on the traditional approach (ba)

using NCPs in Table 3 The ba s are uniformly smaller than the ba s especially when theNCPs are not huge When an NCP is large enough there is almost no overlap betweenthe distribution of (T | H0) and that of (T | H1) Any sensible inference procedure can tellthe difference between the two and consequently the power estimates under Ha2 inTable 4 are approximately the same

102 Ke-Hai Yuan and Kentaro Hayashi

Table 2 Bootstrap critical values c 05

Statistic Sample x i Sample x (05)i Sample x (25)

i

TML 42348 39991 37022TSB 41694 41067 40587TB 57930 55120 53925

Table 3 Non-centrality parameter estimates associated with each statistic and sample

Sample x i Sample x (05)i Sample x (25)

i

Statistic Ha1 Ha2 Ha1 Ha2 Ha1 Ha2

TML 22603 51187 22603 51187 22603 51187TSB 21763 48695 23107 51901 24875 55948TB 20289 56726 20130 63187 20683 69516

Table 4 illustrates the power properties of each statistic when the data changeWhen the outlying cases are downweighted not only do the parameter estimatesbecome more efcient (Yuan et al 2001) but also the powers for identifying anincorrect model are higher for all the statistics The powers of TML and TSB arequite comparable while the power of TB is much lower This is in contrast to theconclusion for power analysis based on the non-central chi-square where TB hasthe highest power in detecting a wrong model (Fouladi 2000 Yuan amp Bentler1997)

Next we focus on testing model (17) for close t We will only present results of thebootstrap using TML The use of TSB or TB is essentially the same MacCallum et al(1996) recommended testing close t (RMSEA 05) fair t (05 lt RMSEA 08)mediocre t (08 RMSEA 10) and poor t (10 lt RMSEA) We will conduct abootstrap procedure for these tests With (10) it is easy to see thatRMSEA = [minv D(S c S(v))( p plusmn q)]12 = e corresponds to dc = n ( p plusmn q)e2 Thed c s corresponding to e = 05 08 and 10 are given in the second row of Table 5The 95th percentiles of the corresponding x 2

24(dc ) are in the third row of the tableThese are respectively xed critical values for testing close t fair t and mediocre tin the approach of MacCallum et al (1996)

Bootstrap approach to inference and power analysis 103

Table 4 Bootstrap power estimates (ba ) and power estimates (ba) referring to x 24(d)

Sample x i Sample x (05)i Sample x (25)

i

Statistic Ha1 Ha2 Ha1 Ha2 Ha1 Ha2

TML ba 627 993 661 997 715 998ba 803 998 803 998 803 998

TSB ba 613 989 661 997 717 998ba 782 997 814 998 851 999

TB ba 374 983 454 991 501 995ba 743 999 739 1000 754 1000

Table 5 Test for close t

RMSEA 05 08 10dc 87 22272 3480

Sample TML x 224(dc 95)plusmn 1 48919 66926 82764

x i 51187 ca 53013 69928 84287p x 2 033 315 695pB 073 359 695

x (05)i 49408 ca 49548 55595 59620

p x 2 046 368 743pB 053 107 163

x (25)i 48883 ca 46305 51741 55899

p x 2 050 384 756pB 034 080 131

Let S x be the sample covariance matrix of x i and v be the corresponding maximumlikelihood estimate of v The values of h in the solution for Sc in the form ofS

h= (1 plusmn h)S(v) + hS x corresponding to e = 05 08 and 10 are respectively

0421 61 0669 81 and 0830 83 Applying the bootstrap procedure outlined in Section24 to the samples x i x (05)

i and x (25)i the estimates ca and p-values for each of the

samples are reported in Table 5 The results indicate that there are substantialdifferences between the traditional p-values px 2 and the bootstrap p-values pBespecially when RMSEA = 10 for the downweighted samples It is obvious that TMLwhen RMSEA = 10 has a much shorter right tail than that of x 2

24(dc) This fact is alsoreected in the corresponding ca s Table 5 once again illustrates the fact that given aclose t the behaviour of TML cannot be described by a non-central chi-square unlessthe corresponding TML under H0 in (1) has a heavy right tail As when testing for exactt using a chi-square table to judge the signicance of a statistic in testing model (17) forclose t is misleading

Example 2 Neumann (1994) presented an alcohol and psychological symptomdata set consisting of 10 variables and 335 cases The two variables in x = (x 1 x 2) cent arerespectively family history of psychopathology and family history of alcoholism whichare indicators for a latent construct of family history The eight variables iny = (y1 y8) cent are respectively the age of rst problem with alcohol age of rstdetoxication from alcohol alcohol severity score alcohol use inventory SCL-90psychological inventory the sum of the Minnesota Multiphasic Personality Inventoryscores the lowest level of psychosocial functioning during the past year and the highestlevel of psychosocial functioning during the past year With two indicators for eachlatent construct these eight variables respectively measure age of onset alcoholsymptoms psychopathology symptoms and global functioning Neumannrsquos (1994)theoretical model for this data set is

x = Lx y + d y = L y h + e (18a)

h = B h + Gy + z (18b)

where

Lx =10

l 1

Aacute Ly =

1 l 2 0 0 0 0 0 0

0 0 1 l 3 0 0 0 0

0 0 0 0 1 l 4 0 0

0 0 0 0 0 0 1 l 5

0

BBBBB

1

CCCCCA

cent

(18c)

B =

0 0 0 0b 21 0 0 0

b 31 b 32 0 00 b42 b43 0

0

BBB

1

CCCA G =

g11

0

00

0

BBB

1

CCCA F = Var(y) (18d )

and e d and z are vectors of errors whose elements are all uncorrelated The modeldegrees of freedom are 29

With Mardiarsquos multivariate kurtosis equal to 1476 the data set may come from adistribution with heavy tails Abootstrap procedure is more appropriate after the heavytails are properly downweighted Actually Yuan et al (2001) found that the samplex (10)

i leads to the most efcient parameter estimates in (18) among various proceduresHere we apply the three statistics to the two samples x i and x (10)

i Our purpose is to

104 Ke-Hai Yuan and Kentaro Hayashi

explore the pivotal property of TML TSB and TB on each sample After noticing that alarge portion of the QQ plot for TML applying to x (10)

i is below the x = y line ouranalysis also includes the sample x (05)

i The QQplots of the three statistics applied to x i indicate that none of them is nearly

pivotal TML and TSB are for the most part above the x = y line however both their lefttails are slightlybelow the x = y line This implies that some bootstrap samples t model(18) extremely well a phenomenon which can be caused by too many data points nearthe centre of the distribution Adownweighting procedure only affects data points thatcause a test statistic to possess a heavy right tail It is not clear to us how to deal with asample when a test statistic has a light left tail The QQplots of TML and TSB on x (05)

i andx (10)

i suggest that their heavier right tails on x i are under control However the right tailof TB is still quite heavy when compared to x 2

29 The downweighting transformationH(10) not only just controls the right tail of TML but also makes most of the QQplot fallbelow the x = y line Visually inspecting the QQ plots of TML on x i x (05)

i and x (10)i

suggests that a downweighting transformation by H(r) with 0 lt r lt 05 may lead toa better procedure for analysing the alcohol and psychological symptom data setbased on TML Since any procedure is only an approximation to the real world andTML is nearly pivotal when applying to x (05)

i we recommend the analysis using TMLon x (05)

i for this data set This leads to TML = 4243 with a bootstrap p-value of 040implying that the model (18) marginally ts the alcohol and psychological symptomdata set

Both the raw data sets in Examples 1 and 2 have signicant multivariate kurtoses andthe downweighting transformation (15) achieves approximately pivotal behaviour ofthe statistics We maywonder how these test statistics behave when applied to a data setthat does not have a signicant multivariate kurtosis This is demonstrated in thefollowing example

Example 3 Table 121 of Mardia Kent and Bibby (1979) contains test scores ofn = 88 students on p = 5 topics mechanics vectors algebra analysis and statisticsThe rst two topics were tested with closed-book exams and the last three withopen-book exams Since these two examination methods may tap different abilities atwo-factor model as in (17a) with

L =10 l 21 0 0 00 0 10 l 42 l 52

sup3 acutecent F =

f11 f12

f 21 f 22

sup3 acute(19)

was proposed and conrmed by Tanaka Watadani and Moon (1991) Mardiarsquos multi-variate kurtosis for this data set equal to 0057 is not signicant It would be interestingto see how TML TB and TSB perform on this data set

QQ plots for the three statistics applied to x i indicate that all of them have heavierright tails than that of x 2

4 The QQ plot for TML applied to x (r)i continues to exhibit a

heavier right tail until r reaches 30 TSB and TB still possess quite heavy right tails evenwhen r = 30

This data set has been used as an example for inuence analysis Previous studiesindicate that case number 81 is the most inuential point (Lee amp Wang 1996 Fung ampKwan 1995) So it is enticing to apply the three statistics to the x i s without the 81stcase The QQplots of TML TSB and TB applied to the remaining 87 cases indicate that allthe three statistics still have heavier right tails than that of x 2

4 Actually compared tothose based on all 88 cases the right tails of the three statistics on the 87 cases are evenheavier For this data set our recommended analysis is to use TML applied to x (30)

i With

Bootstrap approach to inference and power analysis 105

TML = 189 and a bootstrap p-value of 71 model (19) is more than good enough inexplaining the relationship of the ve variables

6 Non-convergence with Bootstrap ReplicationsNon-convergence issues exist with resampling and caution is needed for bootstrapinference (Ichikawa amp Konishi 1995) This generally happens when the sample size isnot large enough and especially when a model structure is wrong This issue wasdiscussed in Yuan and Bentler (1997) with Monte Carlo studies on a covariancestructure model Here we propose a reasonable way of handling non-convergencewith bootstrap replications

With an iterative algorithm like Newtonrsquos convergence is generally dened ask Dv ( j ) k lt e where e is a small number and Dv ( j ) = v ( j ) plusmn v ( j plusmn 1) with v ( j ) being thej th step solution Let s = vech(S) be the factor formed by stacking the non-duplicatedelements of S and let its sample counterpart be s = vech(S) Then Dv ( j + 1) =

( Ccedils centj Wj Ccedilsj)plusmn 1 Ccedils centj Wj (s plusmn sj) where Ccedilsj = [ds(v)dv | v ( j )] sj = s(v ( j )) and Wj is the

corresponding weight matrix evaluated at v ( j ) So Dv ( j ) is proportional toS plusmn S(v ( j plusmn 1)) and it is impossible for k Dv ( j ) k to be smaller than e if S(v) is far fromE(S) Although a model is correct in a bootstrap population F0 some bootstrapreplications may still lie far away from the model especially when sampling errors arelarge for small samples For these samples even if we can obtain a solution using somenon-iterative or direct search method the corresponding statistic will be signicantBased on this fact we should distinguish two kinds of non-convergence in bootstrap orgeneral Monte Carlo studies The rst is where a sample contains enough distinct pointsand still cannot reach convergence with a model which should be treated as asignicant replication or a lsquobad modelrsquo The second is where a sample does not containenough distinct observations to t a model For obtaining a TML this number is p + 1 thenumber is p + 1 for obtaining a TB Although a TSB can be obtained once a TML isavailable we need to have a positive denite sample covariance matrix S y ofy i = vech[(x plusmn x )(x plusmn x ) cent ] in order for TSB to make sense (Bentler amp Yuan 1999)and the minimum number of distinct data points for S y to be positive denite is p + 1We should treat the second case as a bad sample and ignore it in bootstrap replications

In practice instead of estimating ca one generally reports the p-value of a statistic TIf all B bootstrap replications result in convergent solutions

pB =B0 + 1 plusmn M

B0 + 1

where M = fTb | Tb gt Tg When non-convergences occur B0 should be dened as thenumber of converged samples plus the number of signicant samples due to a badmodel M should be dened as the number of non-converged samples due to the badmodel plus the number of converged samples that result in Tb gt T With this modi-cation there is no problem with hypothesis testing A similar modication applies toformula (8) for power evaluation With ca = T(B0 (1 plusmn a)) in determining the numeratorin (8) one needs at least B0(1 plusmn a) converged samples when sampling from F0 The proposed procedure fails to generate a power estimate when the number ofconvergences under H0 is below B0(1 plusmn a)

Let l 1(S x ) $ l 2(S x ) $ $ l p(S x ) be the eigenvalues of the sample covariancematrix S x and g1(S y ) $ g 2(S y ) $ $ gp (S y ) be the eigenvalues of the sample

106 Ke-Hai Yuan and Kentaro Hayashi

covariance matrix S y of y i = vech[(x plusmn x )(x plusmn x ) cent ] In computing the three examplesour criteria for bad samples are l p(S x )l 1(S x ) 10plusmn 20 for obtaining TML andgp (S y )g1(S y ) 10plusmn 20 for obtaining TSB and TB If these criteria are satised and themodel still cannot reach convergence (k Dv k lt 10plusmn 5) in 100 iterations we treat thecorresponding statistic as innity All non-convergences with the three examples weredue to signicant replications these are reported in Table 6 Table 6(a) implies that for agiven sample the further a model is from H0 the more often non-convergences occurThe table also suggests that with a given model the heavier the tails of a data set themore often one obtains non-convergences

7 Discussion and conclusionsExisting conclusions regarding the three commonly used statistics TML TSB and TB arebased on asymptotics and Monte Carlo studies The properties inherent in either ofthese approaches may not be enjoyed by these statistics in practice because of either anite sample or an unknown sampling distribution By applying them to a specic data

Bootstrap approach to inference and power analysis 107

Table 6 Non-convergences due to signicant samples

(a) Example 1

Sample x i Sample x (05)i Sample x (25)

i

Statistic H0 Ha1 Ha2 H0 Ha1 Ha2 H0 Ha1 Ha2

TML 0 0 2 0 0 1 0 0 1TSB 0 0 2 0 0 1 0 0 1TB 0 10 21 0 10 27 0 7 20

(b) Example 2

Sample x i Sample x (05)i Sample x (10)

iStatistic H0 H0 H0

TML 9 1 1TSB 9 1 1TB 1 0 0

(c) Example 3

Sample x i Sample x (30)i Sample x i

(81st case removed)Statistic H0 H0 H0

TML 1 0 0TSB 1 0 0TB 0 0 0

set through resampling properties of the three statistics can be visually examined bymeans of QQplots When a data set possesses heavy tails TML will inherit these tails byhaving a heavy right tail Actually all three statistics need the sampling distribution tohave nite fourth-order moments With possible violation of this assumption bypractical data we propose applying a bootstrap procedure to a transformed samplex (r)

i through downweighting The combination of bootstrapping and downweightingnot only offers a theoretical justication for applying the bootstrap to a data set withheavy tails but also provides a quite exible tool for exploring the properties of each ofthe three statistics Even if inference is based on referring a statistic to a chi-squaredistribution one will obtain a more accurate model evaluation by applying TML to atransformed sample x (r)

i Asymptotics justify the three statistics from different perspectives and TB is

asymptotically distribution-free Previous Monte Carlo studies mainly support TSBHowever with proper downweighting TML is generally the one that is best describedby a chi-square distribution However the conclusion that TML is the best statistic forbootstrap inference has to be preliminary Future studies may nd TB or TSB moreappropriate for other data sets We recommend exploring the different procedures for agiven data set as illustrated in Section 5

AcknowledgementsThe authors would like to thank Professor Peter M Bentler and two anonymous referees whoseconstructive comments lead to an improved version of this paper Correspondence concerningthis article should be addressed to Ke-Hai Yuan (kyuanndedu)

ReferencesAmemiya Y amp Anderson T W (1990) Asymptotic chi-square tests for a large class of factor

analysis models Annals of Statistics 18 1453ndash1463Barndorff-Nielsen O E amp Cox D R (1984) Bartlett adjustments to the likelihood ratio statistic

and the distribution of the maximum likelihood estimator Journal of the Royal StatisticalSociety B 46 483ndash495

Bentler P M (1995) EQS Structural equations program manual Encino CA MultivariateSoftware

Bentler P M amp Yuan K-H (1999) Structural equation modeling with small samples Teststatistics Multivariate Behavioral Research 34 181ndash197

Beran R (1986) Simulated power functions Annals of Statistics 14 151ndash173Beran R (1988) Prepivoting test statistics Abootstrap view of asymptotic renements Journal

of the American Statistical Association 83 687ndash697Beran R amp Srivastava M S (1985) Bootstrap tests and condence regions for functions of a

covariance matrix Annals of Statistics 13 95ndash115Bollen K A amp Stine R (1993) Bootstrapping goodness of t measures in structural equation

models In K A BOllen and J S Long (Eds) Testing structural equation models(pp 111ndash135) Newbury Park CA Sage

Browne M W (1984) Asymptotic distribution-free methods for the analysis of covariancestructures British Journal of Mathematical and Statistical Psychology 37 62ndash83

Browne M W amp Shapiro A (1988) Robustness of normal theory methods in the analysis oflinear latent variate models British Journal of Mathematical and Statistical Psychology 41193ndash208

108 Ke-Hai Yuan and Kentaro Hayashi

Cohen J (1988) Statistical power analysis for the behavioral sciences (2nd ed) Hillsdale NJErlbaum

Davison A C amp Hinkley D V (1997) Bootstrap methods and their application CambridgeCambridge University Press

Efron B ampTibshirani R J (1993) An introduction to the bootstrap New York Chapman ampHallFouladi R T (1998) Covariance structure analysis techniques under conditions of multi-

variate normality and nonnormality Modied and bootstrap based test statistics Paperpresented at the annual meeting of the American Educational Research Association San DiegoCA

Fouladi R T (2000) Performance of modied test statistics in covariance and correlationstructure analysis under conditions of multivariate nonnormality Structural EquationModeling 7 356ndash410

Fung W K ampKwan C W (1995) Sensitivityanalysis in factor analysis Difference between usingcovariance and correlation matrices Psychometrika 60 607ndash614

Hall P (1992) The bootstrap and Edgeworth expansion New York Springer-VerlagHolzinger K J amp Swineford F (1939) A study in factor analysis The stability of a bi-factor

solution University of Chicago Press Supplementary Educational Monographs No 48Hu L T Bentler P M amp Kano Y (1992) Can test statistics in covariance structure analysis be

trusted Psychological Bulletin 112 351ndash362Huber P J (1981) Robust statistics New York WileyIchikawa M amp Konishi S (1995) Application of the bootstrap methods in factor analysis

Psychometrika 60 77ndash93Lawley D N amp Maxwell A E (1971) Factor analysis as a statistical method (2nd ed) New

York American ElsevierLee S Y ampWang S J (1996) Sensitivityanalysis of structural equation models Psychometrika

61 93ndash108MacCallum R C Browne M W ampSugawara H M (1996) Power analysis and determination of

sample size for covariance structure modeling Psychological Methods 1 130ndash149Mardia K V (1970) Measure of multivariate skewness and kurtosis with applications

Biometrika 57 519ndash530Mardia K V Kent J T amp Bibby J M (1979) Multivariate analysis New York Academic PressMaronna R A (1976) Robust M-estimators of multivariate location and scatter Annals of

Statistics 4 51ndash67Mooijaart A ampBentler P M (1991) Robustness of normal theory statistics in structural equation

models Statistica Neerlandica 45 159ndash171Neumann C S (1994) Structural equation modeling of symptoms of alcoholism and

psychopathology Doctoral dissertation University of Kansas LawrenceSatorra A amp Bentler P M (1988) Scaling corrections for chi-square statistics in covariance

structure analysis In 1988 Proceedings of Business and Economics Sections (pp 308ndash313)Alexandria VA American Statistical Association

Satorra A amp Bentler P M (1990) Model conditions for asymptotic robustness in the analysis oflinear relations Computational Statistics amp Data Analysis 10 235ndash249

Satorra A ampSaris W E (1985) Power of the likelihood ratio test in covariance structure analysisPsychometrika 50 83ndash90

Satorra A Saris W E amp de Pijper W M (1991) Acomparison of several approximations to thepower function of the likelihood ratio test in covariance structure analysis StatisticaNeerlandica 45 173ndash185

Shapiro A (1987) Robustness properties of the MDF analysis of moment structures SouthAfrican Statistical Journal 21 39ndash62

Steiger J H amp Lind J M (1980) Statistically based tests for the number of common factorsPaper presented at the annual meeting of the Psychometric Society Iowa City

Steiger J H Shapiro A amp Browne M W (1985) On the multivariate asymptotic distribution ofsequential chi-square statistics Psychometrika 50 253ndash264

Bootstrap approach to inference and power analysis 109

Tanaka Y Watadani S amp Moon S H (1991) Inuence in covariance structure analysis with anapplication to conrmatory factor analysis Communications in Statistics Theory andMethods 20 3805ndash3821

Tyler D E (1983) Robustness and efciency properties of scatter matrices Biometrika 70411ndash420

Wakaki H Eguchi S amp Fujikoshi Y (1990) A class of tests for a general covariance structureJournal of Multivariate Analysis 32 313ndash325

Yuan K-H amp Bentler P M (1997) Mean and covariance structure analysis Theoretical andpractical improvements Journal of the American Statistical Association 92 767ndash774

Yuan K-H amp Bentler P M (1998) Structural equation modeling with robust covariancesIn A E Raftery (ed) Sociological methodology 1998 (pp 363ndash396) Boston BlackwellPublishers

Yuan K-H amp Bentler P M (1999) On normal theory and associated test statistics in covariancestructure analysis under two classes of nonnormal distributions Statistica Sinica 9 831ndash853

Yuan K-H Chan W amp Bentler P M (2000) Robust transformation with applications tostructural equation modelling British Journal of Mathematical and Statistical Psychology53 31ndash50

Yuan K-H Bentler P M amp Chan W (2001) Structural equation modeling with heavy taileddistributions Manuscript submitted for publication

Yuan K-H amp Hayashi K (2001) On using an empirical Bayes covariance matrix in bootstrapapproach to covariance structure analysis Manuscript submitted for publication

Yung Y F amp Bentler P M (1994) Bootstrap-corrected ADF test statistics in covariance structureanalysis British Journal of Mathematical and Statistical Psychology 47 63ndash84

Yung Y F amp Bentler P M (1996) Bootstrapping techniques in analysis of mean and covariancestructures In G A Marcoulides and R E Schumacker (Eds) Advanced structural equationmodeling Techniques and issues (pp 195ndash226) Hillsdale NJ Lawrence Erlbaum

Zhang J amp Boos D D (1992) Bootstrap critical values for testing homogeneity of covariancematrices Journal of the American Statistical Association 87 425ndash429

Received 9 November 2000 revised version received 22 March 2002

110 Ke-Hai Yuan and Kentaro Hayashi

Page 10: Bootstrap approach to inference and power analysis based on three test statistics for covariance structure models

to the raw sample x i by minimizing DML The second alternative Sa 2 is the samplecovariance matrix of the raw sample With these two alternatives the NCPs in (9) for thethree statistics are given in Table 3 Notice that once a Sa is given the d in (9)corresponding to TML is xed and does not depend on the sample Because both TSB andTB involve fourth-order sample moments their corresponding ds change as the tails ofthe data set change QQplots for the ordered T s against quantiles of the x 2

24(d) indicatethat the distributions of TML and TSB applied to x i are well approximated by thecorresponding non-central chi-squares Because TML under H0 applied to x (25)

i is wellapproximated by x 2

24 we would expect its distributions under the two alternatives alsoto be well approximated by x 2

24(2260) and x 224(5119) respectively This expectation is

not fullled as judged from the corresponding QQplots The reason for this is that evenfor perfectly normal data simulated from Np( m S) the statistic TML will not behave like anon-central chi-square unless d is quite small and n is large Because the NCP tends to beoverestimated when it is large (Satorra Saris amp de Pijper 1991) the corresponding QQplots are below the x = y line

Can we trust the results of power analysis in the traditional approach when TML andTSB are well approximated by non-central chi-square distributions in this example Theanswer is no This is because the critical value x 2

24(95)plusmn 1 = 36415 which does not takethe actual variabilityof TMLor TSB into account is not a good estimate of ca in (3) As canbe seen from Table 2 there exist quite substantial differences between x 2

24(95)plusmn 1 andthe c05 s Actually power analysis based on a non-central chi-square table is not reliablefor even perfectly normal data when S(v) is not near S a Table 4 contrasts the powerestimates based on the bootstrap (ba ) with those based on the traditional approach (ba)

using NCPs in Table 3 The ba s are uniformly smaller than the ba s especially when theNCPs are not huge When an NCP is large enough there is almost no overlap betweenthe distribution of (T | H0) and that of (T | H1) Any sensible inference procedure can tellthe difference between the two and consequently the power estimates under Ha2 inTable 4 are approximately the same

102 Ke-Hai Yuan and Kentaro Hayashi

Table 2 Bootstrap critical values c 05

Statistic Sample x i Sample x (05)i Sample x (25)

i

TML 42348 39991 37022TSB 41694 41067 40587TB 57930 55120 53925

Table 3 Non-centrality parameter estimates associated with each statistic and sample

Sample x i Sample x (05)i Sample x (25)

i

Statistic Ha1 Ha2 Ha1 Ha2 Ha1 Ha2

TML 22603 51187 22603 51187 22603 51187TSB 21763 48695 23107 51901 24875 55948TB 20289 56726 20130 63187 20683 69516

Table 4 illustrates the power properties of each statistic when the data changeWhen the outlying cases are downweighted not only do the parameter estimatesbecome more efcient (Yuan et al 2001) but also the powers for identifying anincorrect model are higher for all the statistics The powers of TML and TSB arequite comparable while the power of TB is much lower This is in contrast to theconclusion for power analysis based on the non-central chi-square where TB hasthe highest power in detecting a wrong model (Fouladi 2000 Yuan amp Bentler1997)

Next we focus on testing model (17) for close t We will only present results of thebootstrap using TML The use of TSB or TB is essentially the same MacCallum et al(1996) recommended testing close t (RMSEA 05) fair t (05 lt RMSEA 08)mediocre t (08 RMSEA 10) and poor t (10 lt RMSEA) We will conduct abootstrap procedure for these tests With (10) it is easy to see thatRMSEA = [minv D(S c S(v))( p plusmn q)]12 = e corresponds to dc = n ( p plusmn q)e2 Thed c s corresponding to e = 05 08 and 10 are given in the second row of Table 5The 95th percentiles of the corresponding x 2

24(dc ) are in the third row of the tableThese are respectively xed critical values for testing close t fair t and mediocre tin the approach of MacCallum et al (1996)

Bootstrap approach to inference and power analysis 103

Table 4 Bootstrap power estimates (ba ) and power estimates (ba) referring to x 24(d)

Sample x i Sample x (05)i Sample x (25)

i

Statistic Ha1 Ha2 Ha1 Ha2 Ha1 Ha2

TML ba 627 993 661 997 715 998ba 803 998 803 998 803 998

TSB ba 613 989 661 997 717 998ba 782 997 814 998 851 999

TB ba 374 983 454 991 501 995ba 743 999 739 1000 754 1000

Table 5 Test for close t

RMSEA 05 08 10dc 87 22272 3480

Sample TML x 224(dc 95)plusmn 1 48919 66926 82764

x i 51187 ca 53013 69928 84287p x 2 033 315 695pB 073 359 695

x (05)i 49408 ca 49548 55595 59620

p x 2 046 368 743pB 053 107 163

x (25)i 48883 ca 46305 51741 55899

p x 2 050 384 756pB 034 080 131

Let S x be the sample covariance matrix of x i and v be the corresponding maximumlikelihood estimate of v The values of h in the solution for Sc in the form ofS

h= (1 plusmn h)S(v) + hS x corresponding to e = 05 08 and 10 are respectively

0421 61 0669 81 and 0830 83 Applying the bootstrap procedure outlined in Section24 to the samples x i x (05)

i and x (25)i the estimates ca and p-values for each of the

samples are reported in Table 5 The results indicate that there are substantialdifferences between the traditional p-values px 2 and the bootstrap p-values pBespecially when RMSEA = 10 for the downweighted samples It is obvious that TMLwhen RMSEA = 10 has a much shorter right tail than that of x 2

24(dc) This fact is alsoreected in the corresponding ca s Table 5 once again illustrates the fact that given aclose t the behaviour of TML cannot be described by a non-central chi-square unlessthe corresponding TML under H0 in (1) has a heavy right tail As when testing for exactt using a chi-square table to judge the signicance of a statistic in testing model (17) forclose t is misleading

Example 2 Neumann (1994) presented an alcohol and psychological symptomdata set consisting of 10 variables and 335 cases The two variables in x = (x 1 x 2) cent arerespectively family history of psychopathology and family history of alcoholism whichare indicators for a latent construct of family history The eight variables iny = (y1 y8) cent are respectively the age of rst problem with alcohol age of rstdetoxication from alcohol alcohol severity score alcohol use inventory SCL-90psychological inventory the sum of the Minnesota Multiphasic Personality Inventoryscores the lowest level of psychosocial functioning during the past year and the highestlevel of psychosocial functioning during the past year With two indicators for eachlatent construct these eight variables respectively measure age of onset alcoholsymptoms psychopathology symptoms and global functioning Neumannrsquos (1994)theoretical model for this data set is

x = Lx y + d y = L y h + e (18a)

h = B h + Gy + z (18b)

where

Lx =10

l 1

Aacute Ly =

1 l 2 0 0 0 0 0 0

0 0 1 l 3 0 0 0 0

0 0 0 0 1 l 4 0 0

0 0 0 0 0 0 1 l 5

0

BBBBB

1

CCCCCA

cent

(18c)

B =

0 0 0 0b 21 0 0 0

b 31 b 32 0 00 b42 b43 0

0

BBB

1

CCCA G =

g11

0

00

0

BBB

1

CCCA F = Var(y) (18d )

and e d and z are vectors of errors whose elements are all uncorrelated The modeldegrees of freedom are 29

With Mardiarsquos multivariate kurtosis equal to 1476 the data set may come from adistribution with heavy tails Abootstrap procedure is more appropriate after the heavytails are properly downweighted Actually Yuan et al (2001) found that the samplex (10)

i leads to the most efcient parameter estimates in (18) among various proceduresHere we apply the three statistics to the two samples x i and x (10)

i Our purpose is to

104 Ke-Hai Yuan and Kentaro Hayashi

explore the pivotal property of TML TSB and TB on each sample After noticing that alarge portion of the QQ plot for TML applying to x (10)

i is below the x = y line ouranalysis also includes the sample x (05)

i The QQplots of the three statistics applied to x i indicate that none of them is nearly

pivotal TML and TSB are for the most part above the x = y line however both their lefttails are slightlybelow the x = y line This implies that some bootstrap samples t model(18) extremely well a phenomenon which can be caused by too many data points nearthe centre of the distribution Adownweighting procedure only affects data points thatcause a test statistic to possess a heavy right tail It is not clear to us how to deal with asample when a test statistic has a light left tail The QQplots of TML and TSB on x (05)

i andx (10)

i suggest that their heavier right tails on x i are under control However the right tailof TB is still quite heavy when compared to x 2

29 The downweighting transformationH(10) not only just controls the right tail of TML but also makes most of the QQplot fallbelow the x = y line Visually inspecting the QQ plots of TML on x i x (05)

i and x (10)i

suggests that a downweighting transformation by H(r) with 0 lt r lt 05 may lead toa better procedure for analysing the alcohol and psychological symptom data setbased on TML Since any procedure is only an approximation to the real world andTML is nearly pivotal when applying to x (05)

i we recommend the analysis using TMLon x (05)

i for this data set This leads to TML = 4243 with a bootstrap p-value of 040implying that the model (18) marginally ts the alcohol and psychological symptomdata set

Both the raw data sets in Examples 1 and 2 have signicant multivariate kurtoses andthe downweighting transformation (15) achieves approximately pivotal behaviour ofthe statistics We maywonder how these test statistics behave when applied to a data setthat does not have a signicant multivariate kurtosis This is demonstrated in thefollowing example

Example 3 Table 121 of Mardia Kent and Bibby (1979) contains test scores ofn = 88 students on p = 5 topics mechanics vectors algebra analysis and statisticsThe rst two topics were tested with closed-book exams and the last three withopen-book exams Since these two examination methods may tap different abilities atwo-factor model as in (17a) with

L =10 l 21 0 0 00 0 10 l 42 l 52

sup3 acutecent F =

f11 f12

f 21 f 22

sup3 acute(19)

was proposed and conrmed by Tanaka Watadani and Moon (1991) Mardiarsquos multi-variate kurtosis for this data set equal to 0057 is not signicant It would be interestingto see how TML TB and TSB perform on this data set

QQ plots for the three statistics applied to x i indicate that all of them have heavierright tails than that of x 2

4 The QQ plot for TML applied to x (r)i continues to exhibit a

heavier right tail until r reaches 30 TSB and TB still possess quite heavy right tails evenwhen r = 30

This data set has been used as an example for inuence analysis Previous studiesindicate that case number 81 is the most inuential point (Lee amp Wang 1996 Fung ampKwan 1995) So it is enticing to apply the three statistics to the x i s without the 81stcase The QQplots of TML TSB and TB applied to the remaining 87 cases indicate that allthe three statistics still have heavier right tails than that of x 2

4 Actually compared tothose based on all 88 cases the right tails of the three statistics on the 87 cases are evenheavier For this data set our recommended analysis is to use TML applied to x (30)

i With

Bootstrap approach to inference and power analysis 105

TML = 189 and a bootstrap p-value of 71 model (19) is more than good enough inexplaining the relationship of the ve variables

6 Non-convergence with Bootstrap ReplicationsNon-convergence issues exist with resampling and caution is needed for bootstrapinference (Ichikawa amp Konishi 1995) This generally happens when the sample size isnot large enough and especially when a model structure is wrong This issue wasdiscussed in Yuan and Bentler (1997) with Monte Carlo studies on a covariancestructure model Here we propose a reasonable way of handling non-convergencewith bootstrap replications

With an iterative algorithm like Newtonrsquos convergence is generally dened ask Dv ( j ) k lt e where e is a small number and Dv ( j ) = v ( j ) plusmn v ( j plusmn 1) with v ( j ) being thej th step solution Let s = vech(S) be the factor formed by stacking the non-duplicatedelements of S and let its sample counterpart be s = vech(S) Then Dv ( j + 1) =

( Ccedils centj Wj Ccedilsj)plusmn 1 Ccedils centj Wj (s plusmn sj) where Ccedilsj = [ds(v)dv | v ( j )] sj = s(v ( j )) and Wj is the

corresponding weight matrix evaluated at v ( j ) So Dv ( j ) is proportional toS plusmn S(v ( j plusmn 1)) and it is impossible for k Dv ( j ) k to be smaller than e if S(v) is far fromE(S) Although a model is correct in a bootstrap population F0 some bootstrapreplications may still lie far away from the model especially when sampling errors arelarge for small samples For these samples even if we can obtain a solution using somenon-iterative or direct search method the corresponding statistic will be signicantBased on this fact we should distinguish two kinds of non-convergence in bootstrap orgeneral Monte Carlo studies The rst is where a sample contains enough distinct pointsand still cannot reach convergence with a model which should be treated as asignicant replication or a lsquobad modelrsquo The second is where a sample does not containenough distinct observations to t a model For obtaining a TML this number is p + 1 thenumber is p + 1 for obtaining a TB Although a TSB can be obtained once a TML isavailable we need to have a positive denite sample covariance matrix S y ofy i = vech[(x plusmn x )(x plusmn x ) cent ] in order for TSB to make sense (Bentler amp Yuan 1999)and the minimum number of distinct data points for S y to be positive denite is p + 1We should treat the second case as a bad sample and ignore it in bootstrap replications

In practice instead of estimating ca one generally reports the p-value of a statistic TIf all B bootstrap replications result in convergent solutions

pB =B0 + 1 plusmn M

B0 + 1

where M = fTb | Tb gt Tg When non-convergences occur B0 should be dened as thenumber of converged samples plus the number of signicant samples due to a badmodel M should be dened as the number of non-converged samples due to the badmodel plus the number of converged samples that result in Tb gt T With this modi-cation there is no problem with hypothesis testing A similar modication applies toformula (8) for power evaluation With ca = T(B0 (1 plusmn a)) in determining the numeratorin (8) one needs at least B0(1 plusmn a) converged samples when sampling from F0 The proposed procedure fails to generate a power estimate when the number ofconvergences under H0 is below B0(1 plusmn a)

Let l 1(S x ) $ l 2(S x ) $ $ l p(S x ) be the eigenvalues of the sample covariancematrix S x and g1(S y ) $ g 2(S y ) $ $ gp (S y ) be the eigenvalues of the sample

106 Ke-Hai Yuan and Kentaro Hayashi

covariance matrix S y of y i = vech[(x plusmn x )(x plusmn x ) cent ] In computing the three examplesour criteria for bad samples are l p(S x )l 1(S x ) 10plusmn 20 for obtaining TML andgp (S y )g1(S y ) 10plusmn 20 for obtaining TSB and TB If these criteria are satised and themodel still cannot reach convergence (k Dv k lt 10plusmn 5) in 100 iterations we treat thecorresponding statistic as innity All non-convergences with the three examples weredue to signicant replications these are reported in Table 6 Table 6(a) implies that for agiven sample the further a model is from H0 the more often non-convergences occurThe table also suggests that with a given model the heavier the tails of a data set themore often one obtains non-convergences

7 Discussion and conclusionsExisting conclusions regarding the three commonly used statistics TML TSB and TB arebased on asymptotics and Monte Carlo studies The properties inherent in either ofthese approaches may not be enjoyed by these statistics in practice because of either anite sample or an unknown sampling distribution By applying them to a specic data

Bootstrap approach to inference and power analysis 107

Table 6 Non-convergences due to signicant samples

(a) Example 1

Sample x i Sample x (05)i Sample x (25)

i

Statistic H0 Ha1 Ha2 H0 Ha1 Ha2 H0 Ha1 Ha2

TML 0 0 2 0 0 1 0 0 1TSB 0 0 2 0 0 1 0 0 1TB 0 10 21 0 10 27 0 7 20

(b) Example 2

Sample x i Sample x (05)i Sample x (10)

iStatistic H0 H0 H0

TML 9 1 1TSB 9 1 1TB 1 0 0

(c) Example 3

Sample x i Sample x (30)i Sample x i

(81st case removed)Statistic H0 H0 H0

TML 1 0 0TSB 1 0 0TB 0 0 0

set through resampling properties of the three statistics can be visually examined bymeans of QQplots When a data set possesses heavy tails TML will inherit these tails byhaving a heavy right tail Actually all three statistics need the sampling distribution tohave nite fourth-order moments With possible violation of this assumption bypractical data we propose applying a bootstrap procedure to a transformed samplex (r)

i through downweighting The combination of bootstrapping and downweightingnot only offers a theoretical justication for applying the bootstrap to a data set withheavy tails but also provides a quite exible tool for exploring the properties of each ofthe three statistics Even if inference is based on referring a statistic to a chi-squaredistribution one will obtain a more accurate model evaluation by applying TML to atransformed sample x (r)

i Asymptotics justify the three statistics from different perspectives and TB is

asymptotically distribution-free Previous Monte Carlo studies mainly support TSBHowever with proper downweighting TML is generally the one that is best describedby a chi-square distribution However the conclusion that TML is the best statistic forbootstrap inference has to be preliminary Future studies may nd TB or TSB moreappropriate for other data sets We recommend exploring the different procedures for agiven data set as illustrated in Section 5

AcknowledgementsThe authors would like to thank Professor Peter M Bentler and two anonymous referees whoseconstructive comments lead to an improved version of this paper Correspondence concerningthis article should be addressed to Ke-Hai Yuan (kyuanndedu)

ReferencesAmemiya Y amp Anderson T W (1990) Asymptotic chi-square tests for a large class of factor

analysis models Annals of Statistics 18 1453ndash1463Barndorff-Nielsen O E amp Cox D R (1984) Bartlett adjustments to the likelihood ratio statistic

and the distribution of the maximum likelihood estimator Journal of the Royal StatisticalSociety B 46 483ndash495

Bentler P M (1995) EQS Structural equations program manual Encino CA MultivariateSoftware

Bentler P M amp Yuan K-H (1999) Structural equation modeling with small samples Teststatistics Multivariate Behavioral Research 34 181ndash197

Beran R (1986) Simulated power functions Annals of Statistics 14 151ndash173Beran R (1988) Prepivoting test statistics Abootstrap view of asymptotic renements Journal

of the American Statistical Association 83 687ndash697Beran R amp Srivastava M S (1985) Bootstrap tests and condence regions for functions of a

covariance matrix Annals of Statistics 13 95ndash115Bollen K A amp Stine R (1993) Bootstrapping goodness of t measures in structural equation

models In K A BOllen and J S Long (Eds) Testing structural equation models(pp 111ndash135) Newbury Park CA Sage

Browne M W (1984) Asymptotic distribution-free methods for the analysis of covariancestructures British Journal of Mathematical and Statistical Psychology 37 62ndash83

Browne M W amp Shapiro A (1988) Robustness of normal theory methods in the analysis oflinear latent variate models British Journal of Mathematical and Statistical Psychology 41193ndash208

108 Ke-Hai Yuan and Kentaro Hayashi

Cohen J (1988) Statistical power analysis for the behavioral sciences (2nd ed) Hillsdale NJErlbaum

Davison A C amp Hinkley D V (1997) Bootstrap methods and their application CambridgeCambridge University Press

Efron B ampTibshirani R J (1993) An introduction to the bootstrap New York Chapman ampHallFouladi R T (1998) Covariance structure analysis techniques under conditions of multi-

variate normality and nonnormality Modied and bootstrap based test statistics Paperpresented at the annual meeting of the American Educational Research Association San DiegoCA

Fouladi R T (2000) Performance of modied test statistics in covariance and correlationstructure analysis under conditions of multivariate nonnormality Structural EquationModeling 7 356ndash410

Fung W K ampKwan C W (1995) Sensitivityanalysis in factor analysis Difference between usingcovariance and correlation matrices Psychometrika 60 607ndash614

Hall P (1992) The bootstrap and Edgeworth expansion New York Springer-VerlagHolzinger K J amp Swineford F (1939) A study in factor analysis The stability of a bi-factor

solution University of Chicago Press Supplementary Educational Monographs No 48Hu L T Bentler P M amp Kano Y (1992) Can test statistics in covariance structure analysis be

trusted Psychological Bulletin 112 351ndash362Huber P J (1981) Robust statistics New York WileyIchikawa M amp Konishi S (1995) Application of the bootstrap methods in factor analysis

Psychometrika 60 77ndash93Lawley D N amp Maxwell A E (1971) Factor analysis as a statistical method (2nd ed) New

York American ElsevierLee S Y ampWang S J (1996) Sensitivityanalysis of structural equation models Psychometrika

61 93ndash108MacCallum R C Browne M W ampSugawara H M (1996) Power analysis and determination of

sample size for covariance structure modeling Psychological Methods 1 130ndash149Mardia K V (1970) Measure of multivariate skewness and kurtosis with applications

Biometrika 57 519ndash530Mardia K V Kent J T amp Bibby J M (1979) Multivariate analysis New York Academic PressMaronna R A (1976) Robust M-estimators of multivariate location and scatter Annals of

Statistics 4 51ndash67Mooijaart A ampBentler P M (1991) Robustness of normal theory statistics in structural equation

models Statistica Neerlandica 45 159ndash171Neumann C S (1994) Structural equation modeling of symptoms of alcoholism and

psychopathology Doctoral dissertation University of Kansas LawrenceSatorra A amp Bentler P M (1988) Scaling corrections for chi-square statistics in covariance

structure analysis In 1988 Proceedings of Business and Economics Sections (pp 308ndash313)Alexandria VA American Statistical Association

Satorra A amp Bentler P M (1990) Model conditions for asymptotic robustness in the analysis oflinear relations Computational Statistics amp Data Analysis 10 235ndash249

Satorra A ampSaris W E (1985) Power of the likelihood ratio test in covariance structure analysisPsychometrika 50 83ndash90

Satorra A Saris W E amp de Pijper W M (1991) Acomparison of several approximations to thepower function of the likelihood ratio test in covariance structure analysis StatisticaNeerlandica 45 173ndash185

Shapiro A (1987) Robustness properties of the MDF analysis of moment structures SouthAfrican Statistical Journal 21 39ndash62

Steiger J H amp Lind J M (1980) Statistically based tests for the number of common factorsPaper presented at the annual meeting of the Psychometric Society Iowa City

Steiger J H Shapiro A amp Browne M W (1985) On the multivariate asymptotic distribution ofsequential chi-square statistics Psychometrika 50 253ndash264

Bootstrap approach to inference and power analysis 109

Tanaka Y Watadani S amp Moon S H (1991) Inuence in covariance structure analysis with anapplication to conrmatory factor analysis Communications in Statistics Theory andMethods 20 3805ndash3821

Tyler D E (1983) Robustness and efciency properties of scatter matrices Biometrika 70411ndash420

Wakaki H Eguchi S amp Fujikoshi Y (1990) A class of tests for a general covariance structureJournal of Multivariate Analysis 32 313ndash325

Yuan K-H amp Bentler P M (1997) Mean and covariance structure analysis Theoretical andpractical improvements Journal of the American Statistical Association 92 767ndash774

Yuan K-H amp Bentler P M (1998) Structural equation modeling with robust covariancesIn A E Raftery (ed) Sociological methodology 1998 (pp 363ndash396) Boston BlackwellPublishers

Yuan K-H amp Bentler P M (1999) On normal theory and associated test statistics in covariancestructure analysis under two classes of nonnormal distributions Statistica Sinica 9 831ndash853

Yuan K-H Chan W amp Bentler P M (2000) Robust transformation with applications tostructural equation modelling British Journal of Mathematical and Statistical Psychology53 31ndash50

Yuan K-H Bentler P M amp Chan W (2001) Structural equation modeling with heavy taileddistributions Manuscript submitted for publication

Yuan K-H amp Hayashi K (2001) On using an empirical Bayes covariance matrix in bootstrapapproach to covariance structure analysis Manuscript submitted for publication

Yung Y F amp Bentler P M (1994) Bootstrap-corrected ADF test statistics in covariance structureanalysis British Journal of Mathematical and Statistical Psychology 47 63ndash84

Yung Y F amp Bentler P M (1996) Bootstrapping techniques in analysis of mean and covariancestructures In G A Marcoulides and R E Schumacker (Eds) Advanced structural equationmodeling Techniques and issues (pp 195ndash226) Hillsdale NJ Lawrence Erlbaum

Zhang J amp Boos D D (1992) Bootstrap critical values for testing homogeneity of covariancematrices Journal of the American Statistical Association 87 425ndash429

Received 9 November 2000 revised version received 22 March 2002

110 Ke-Hai Yuan and Kentaro Hayashi

Page 11: Bootstrap approach to inference and power analysis based on three test statistics for covariance structure models

Table 4 illustrates the power properties of each statistic when the data changeWhen the outlying cases are downweighted not only do the parameter estimatesbecome more efcient (Yuan et al 2001) but also the powers for identifying anincorrect model are higher for all the statistics The powers of TML and TSB arequite comparable while the power of TB is much lower This is in contrast to theconclusion for power analysis based on the non-central chi-square where TB hasthe highest power in detecting a wrong model (Fouladi 2000 Yuan amp Bentler1997)

Next we focus on testing model (17) for close t We will only present results of thebootstrap using TML The use of TSB or TB is essentially the same MacCallum et al(1996) recommended testing close t (RMSEA 05) fair t (05 lt RMSEA 08)mediocre t (08 RMSEA 10) and poor t (10 lt RMSEA) We will conduct abootstrap procedure for these tests With (10) it is easy to see thatRMSEA = [minv D(S c S(v))( p plusmn q)]12 = e corresponds to dc = n ( p plusmn q)e2 Thed c s corresponding to e = 05 08 and 10 are given in the second row of Table 5The 95th percentiles of the corresponding x 2

24(dc ) are in the third row of the tableThese are respectively xed critical values for testing close t fair t and mediocre tin the approach of MacCallum et al (1996)

Bootstrap approach to inference and power analysis 103

Table 4 Bootstrap power estimates (ba ) and power estimates (ba) referring to x 24(d)

Sample x i Sample x (05)i Sample x (25)

i

Statistic Ha1 Ha2 Ha1 Ha2 Ha1 Ha2

TML ba 627 993 661 997 715 998ba 803 998 803 998 803 998

TSB ba 613 989 661 997 717 998ba 782 997 814 998 851 999

TB ba 374 983 454 991 501 995ba 743 999 739 1000 754 1000

Table 5 Test for close t

RMSEA 05 08 10dc 87 22272 3480

Sample TML x 224(dc 95)plusmn 1 48919 66926 82764

x i 51187 ca 53013 69928 84287p x 2 033 315 695pB 073 359 695

x (05)i 49408 ca 49548 55595 59620

p x 2 046 368 743pB 053 107 163

x (25)i 48883 ca 46305 51741 55899

p x 2 050 384 756pB 034 080 131

Let S x be the sample covariance matrix of x i and v be the corresponding maximumlikelihood estimate of v The values of h in the solution for Sc in the form ofS

h= (1 plusmn h)S(v) + hS x corresponding to e = 05 08 and 10 are respectively

0421 61 0669 81 and 0830 83 Applying the bootstrap procedure outlined in Section24 to the samples x i x (05)

i and x (25)i the estimates ca and p-values for each of the

samples are reported in Table 5 The results indicate that there are substantialdifferences between the traditional p-values px 2 and the bootstrap p-values pBespecially when RMSEA = 10 for the downweighted samples It is obvious that TMLwhen RMSEA = 10 has a much shorter right tail than that of x 2

24(dc) This fact is alsoreected in the corresponding ca s Table 5 once again illustrates the fact that given aclose t the behaviour of TML cannot be described by a non-central chi-square unlessthe corresponding TML under H0 in (1) has a heavy right tail As when testing for exactt using a chi-square table to judge the signicance of a statistic in testing model (17) forclose t is misleading

Example 2 Neumann (1994) presented an alcohol and psychological symptomdata set consisting of 10 variables and 335 cases The two variables in x = (x 1 x 2) cent arerespectively family history of psychopathology and family history of alcoholism whichare indicators for a latent construct of family history The eight variables iny = (y1 y8) cent are respectively the age of rst problem with alcohol age of rstdetoxication from alcohol alcohol severity score alcohol use inventory SCL-90psychological inventory the sum of the Minnesota Multiphasic Personality Inventoryscores the lowest level of psychosocial functioning during the past year and the highestlevel of psychosocial functioning during the past year With two indicators for eachlatent construct these eight variables respectively measure age of onset alcoholsymptoms psychopathology symptoms and global functioning Neumannrsquos (1994)theoretical model for this data set is

x = Lx y + d y = L y h + e (18a)

h = B h + Gy + z (18b)

where

Lx =10

l 1

Aacute Ly =

1 l 2 0 0 0 0 0 0

0 0 1 l 3 0 0 0 0

0 0 0 0 1 l 4 0 0

0 0 0 0 0 0 1 l 5

0

BBBBB

1

CCCCCA

cent

(18c)

B =

0 0 0 0b 21 0 0 0

b 31 b 32 0 00 b42 b43 0

0

BBB

1

CCCA G =

g11

0

00

0

BBB

1

CCCA F = Var(y) (18d )

and e d and z are vectors of errors whose elements are all uncorrelated The modeldegrees of freedom are 29

With Mardiarsquos multivariate kurtosis equal to 1476 the data set may come from adistribution with heavy tails Abootstrap procedure is more appropriate after the heavytails are properly downweighted Actually Yuan et al (2001) found that the samplex (10)

i leads to the most efcient parameter estimates in (18) among various proceduresHere we apply the three statistics to the two samples x i and x (10)

i Our purpose is to

104 Ke-Hai Yuan and Kentaro Hayashi

explore the pivotal property of TML TSB and TB on each sample After noticing that alarge portion of the QQ plot for TML applying to x (10)

i is below the x = y line ouranalysis also includes the sample x (05)

i The QQplots of the three statistics applied to x i indicate that none of them is nearly

pivotal TML and TSB are for the most part above the x = y line however both their lefttails are slightlybelow the x = y line This implies that some bootstrap samples t model(18) extremely well a phenomenon which can be caused by too many data points nearthe centre of the distribution Adownweighting procedure only affects data points thatcause a test statistic to possess a heavy right tail It is not clear to us how to deal with asample when a test statistic has a light left tail The QQplots of TML and TSB on x (05)

i andx (10)

i suggest that their heavier right tails on x i are under control However the right tailof TB is still quite heavy when compared to x 2

29 The downweighting transformationH(10) not only just controls the right tail of TML but also makes most of the QQplot fallbelow the x = y line Visually inspecting the QQ plots of TML on x i x (05)

i and x (10)i

suggests that a downweighting transformation by H(r) with 0 lt r lt 05 may lead toa better procedure for analysing the alcohol and psychological symptom data setbased on TML Since any procedure is only an approximation to the real world andTML is nearly pivotal when applying to x (05)

i we recommend the analysis using TMLon x (05)

i for this data set This leads to TML = 4243 with a bootstrap p-value of 040implying that the model (18) marginally ts the alcohol and psychological symptomdata set

Both the raw data sets in Examples 1 and 2 have signicant multivariate kurtoses andthe downweighting transformation (15) achieves approximately pivotal behaviour ofthe statistics We maywonder how these test statistics behave when applied to a data setthat does not have a signicant multivariate kurtosis This is demonstrated in thefollowing example

Example 3 Table 121 of Mardia Kent and Bibby (1979) contains test scores ofn = 88 students on p = 5 topics mechanics vectors algebra analysis and statisticsThe rst two topics were tested with closed-book exams and the last three withopen-book exams Since these two examination methods may tap different abilities atwo-factor model as in (17a) with

L =10 l 21 0 0 00 0 10 l 42 l 52

sup3 acutecent F =

f11 f12

f 21 f 22

sup3 acute(19)

was proposed and conrmed by Tanaka Watadani and Moon (1991) Mardiarsquos multi-variate kurtosis for this data set equal to 0057 is not signicant It would be interestingto see how TML TB and TSB perform on this data set

QQ plots for the three statistics applied to x i indicate that all of them have heavierright tails than that of x 2

4 The QQ plot for TML applied to x (r)i continues to exhibit a

heavier right tail until r reaches 30 TSB and TB still possess quite heavy right tails evenwhen r = 30

This data set has been used as an example for inuence analysis Previous studiesindicate that case number 81 is the most inuential point (Lee amp Wang 1996 Fung ampKwan 1995) So it is enticing to apply the three statistics to the x i s without the 81stcase The QQplots of TML TSB and TB applied to the remaining 87 cases indicate that allthe three statistics still have heavier right tails than that of x 2

4 Actually compared tothose based on all 88 cases the right tails of the three statistics on the 87 cases are evenheavier For this data set our recommended analysis is to use TML applied to x (30)

i With

Bootstrap approach to inference and power analysis 105

TML = 189 and a bootstrap p-value of 71 model (19) is more than good enough inexplaining the relationship of the ve variables

6 Non-convergence with Bootstrap ReplicationsNon-convergence issues exist with resampling and caution is needed for bootstrapinference (Ichikawa amp Konishi 1995) This generally happens when the sample size isnot large enough and especially when a model structure is wrong This issue wasdiscussed in Yuan and Bentler (1997) with Monte Carlo studies on a covariancestructure model Here we propose a reasonable way of handling non-convergencewith bootstrap replications

With an iterative algorithm like Newtonrsquos convergence is generally dened ask Dv ( j ) k lt e where e is a small number and Dv ( j ) = v ( j ) plusmn v ( j plusmn 1) with v ( j ) being thej th step solution Let s = vech(S) be the factor formed by stacking the non-duplicatedelements of S and let its sample counterpart be s = vech(S) Then Dv ( j + 1) =

( Ccedils centj Wj Ccedilsj)plusmn 1 Ccedils centj Wj (s plusmn sj) where Ccedilsj = [ds(v)dv | v ( j )] sj = s(v ( j )) and Wj is the

corresponding weight matrix evaluated at v ( j ) So Dv ( j ) is proportional toS plusmn S(v ( j plusmn 1)) and it is impossible for k Dv ( j ) k to be smaller than e if S(v) is far fromE(S) Although a model is correct in a bootstrap population F0 some bootstrapreplications may still lie far away from the model especially when sampling errors arelarge for small samples For these samples even if we can obtain a solution using somenon-iterative or direct search method the corresponding statistic will be signicantBased on this fact we should distinguish two kinds of non-convergence in bootstrap orgeneral Monte Carlo studies The rst is where a sample contains enough distinct pointsand still cannot reach convergence with a model which should be treated as asignicant replication or a lsquobad modelrsquo The second is where a sample does not containenough distinct observations to t a model For obtaining a TML this number is p + 1 thenumber is p + 1 for obtaining a TB Although a TSB can be obtained once a TML isavailable we need to have a positive denite sample covariance matrix S y ofy i = vech[(x plusmn x )(x plusmn x ) cent ] in order for TSB to make sense (Bentler amp Yuan 1999)and the minimum number of distinct data points for S y to be positive denite is p + 1We should treat the second case as a bad sample and ignore it in bootstrap replications

In practice instead of estimating ca one generally reports the p-value of a statistic TIf all B bootstrap replications result in convergent solutions

pB =B0 + 1 plusmn M

B0 + 1

where M = fTb | Tb gt Tg When non-convergences occur B0 should be dened as thenumber of converged samples plus the number of signicant samples due to a badmodel M should be dened as the number of non-converged samples due to the badmodel plus the number of converged samples that result in Tb gt T With this modi-cation there is no problem with hypothesis testing A similar modication applies toformula (8) for power evaluation With ca = T(B0 (1 plusmn a)) in determining the numeratorin (8) one needs at least B0(1 plusmn a) converged samples when sampling from F0 The proposed procedure fails to generate a power estimate when the number ofconvergences under H0 is below B0(1 plusmn a)

Let l 1(S x ) $ l 2(S x ) $ $ l p(S x ) be the eigenvalues of the sample covariancematrix S x and g1(S y ) $ g 2(S y ) $ $ gp (S y ) be the eigenvalues of the sample

106 Ke-Hai Yuan and Kentaro Hayashi

covariance matrix S y of y i = vech[(x plusmn x )(x plusmn x ) cent ] In computing the three examplesour criteria for bad samples are l p(S x )l 1(S x ) 10plusmn 20 for obtaining TML andgp (S y )g1(S y ) 10plusmn 20 for obtaining TSB and TB If these criteria are satised and themodel still cannot reach convergence (k Dv k lt 10plusmn 5) in 100 iterations we treat thecorresponding statistic as innity All non-convergences with the three examples weredue to signicant replications these are reported in Table 6 Table 6(a) implies that for agiven sample the further a model is from H0 the more often non-convergences occurThe table also suggests that with a given model the heavier the tails of a data set themore often one obtains non-convergences

7 Discussion and conclusionsExisting conclusions regarding the three commonly used statistics TML TSB and TB arebased on asymptotics and Monte Carlo studies The properties inherent in either ofthese approaches may not be enjoyed by these statistics in practice because of either anite sample or an unknown sampling distribution By applying them to a specic data

Bootstrap approach to inference and power analysis 107

Table 6 Non-convergences due to signicant samples

(a) Example 1

Sample x i Sample x (05)i Sample x (25)

i

Statistic H0 Ha1 Ha2 H0 Ha1 Ha2 H0 Ha1 Ha2

TML 0 0 2 0 0 1 0 0 1TSB 0 0 2 0 0 1 0 0 1TB 0 10 21 0 10 27 0 7 20

(b) Example 2

Sample x i Sample x (05)i Sample x (10)

iStatistic H0 H0 H0

TML 9 1 1TSB 9 1 1TB 1 0 0

(c) Example 3

Sample x i Sample x (30)i Sample x i

(81st case removed)Statistic H0 H0 H0

TML 1 0 0TSB 1 0 0TB 0 0 0

set through resampling properties of the three statistics can be visually examined bymeans of QQplots When a data set possesses heavy tails TML will inherit these tails byhaving a heavy right tail Actually all three statistics need the sampling distribution tohave nite fourth-order moments With possible violation of this assumption bypractical data we propose applying a bootstrap procedure to a transformed samplex (r)

i through downweighting The combination of bootstrapping and downweightingnot only offers a theoretical justication for applying the bootstrap to a data set withheavy tails but also provides a quite exible tool for exploring the properties of each ofthe three statistics Even if inference is based on referring a statistic to a chi-squaredistribution one will obtain a more accurate model evaluation by applying TML to atransformed sample x (r)

i Asymptotics justify the three statistics from different perspectives and TB is

asymptotically distribution-free Previous Monte Carlo studies mainly support TSBHowever with proper downweighting TML is generally the one that is best describedby a chi-square distribution However the conclusion that TML is the best statistic forbootstrap inference has to be preliminary Future studies may nd TB or TSB moreappropriate for other data sets We recommend exploring the different procedures for agiven data set as illustrated in Section 5

AcknowledgementsThe authors would like to thank Professor Peter M Bentler and two anonymous referees whoseconstructive comments lead to an improved version of this paper Correspondence concerningthis article should be addressed to Ke-Hai Yuan (kyuanndedu)

ReferencesAmemiya Y amp Anderson T W (1990) Asymptotic chi-square tests for a large class of factor

analysis models Annals of Statistics 18 1453ndash1463Barndorff-Nielsen O E amp Cox D R (1984) Bartlett adjustments to the likelihood ratio statistic

and the distribution of the maximum likelihood estimator Journal of the Royal StatisticalSociety B 46 483ndash495

Bentler P M (1995) EQS Structural equations program manual Encino CA MultivariateSoftware

Bentler P M amp Yuan K-H (1999) Structural equation modeling with small samples Teststatistics Multivariate Behavioral Research 34 181ndash197

Beran R (1986) Simulated power functions Annals of Statistics 14 151ndash173Beran R (1988) Prepivoting test statistics Abootstrap view of asymptotic renements Journal

of the American Statistical Association 83 687ndash697Beran R amp Srivastava M S (1985) Bootstrap tests and condence regions for functions of a

covariance matrix Annals of Statistics 13 95ndash115Bollen K A amp Stine R (1993) Bootstrapping goodness of t measures in structural equation

models In K A BOllen and J S Long (Eds) Testing structural equation models(pp 111ndash135) Newbury Park CA Sage

Browne M W (1984) Asymptotic distribution-free methods for the analysis of covariancestructures British Journal of Mathematical and Statistical Psychology 37 62ndash83

Browne M W amp Shapiro A (1988) Robustness of normal theory methods in the analysis oflinear latent variate models British Journal of Mathematical and Statistical Psychology 41193ndash208

108 Ke-Hai Yuan and Kentaro Hayashi

Cohen J (1988) Statistical power analysis for the behavioral sciences (2nd ed) Hillsdale NJErlbaum

Davison A C amp Hinkley D V (1997) Bootstrap methods and their application CambridgeCambridge University Press

Efron B ampTibshirani R J (1993) An introduction to the bootstrap New York Chapman ampHallFouladi R T (1998) Covariance structure analysis techniques under conditions of multi-

variate normality and nonnormality Modied and bootstrap based test statistics Paperpresented at the annual meeting of the American Educational Research Association San DiegoCA

Fouladi R T (2000) Performance of modied test statistics in covariance and correlationstructure analysis under conditions of multivariate nonnormality Structural EquationModeling 7 356ndash410

Fung W K ampKwan C W (1995) Sensitivityanalysis in factor analysis Difference between usingcovariance and correlation matrices Psychometrika 60 607ndash614

Hall P (1992) The bootstrap and Edgeworth expansion New York Springer-VerlagHolzinger K J amp Swineford F (1939) A study in factor analysis The stability of a bi-factor

solution University of Chicago Press Supplementary Educational Monographs No 48Hu L T Bentler P M amp Kano Y (1992) Can test statistics in covariance structure analysis be

trusted Psychological Bulletin 112 351ndash362Huber P J (1981) Robust statistics New York WileyIchikawa M amp Konishi S (1995) Application of the bootstrap methods in factor analysis

Psychometrika 60 77ndash93Lawley D N amp Maxwell A E (1971) Factor analysis as a statistical method (2nd ed) New

York American ElsevierLee S Y ampWang S J (1996) Sensitivityanalysis of structural equation models Psychometrika

61 93ndash108MacCallum R C Browne M W ampSugawara H M (1996) Power analysis and determination of

sample size for covariance structure modeling Psychological Methods 1 130ndash149Mardia K V (1970) Measure of multivariate skewness and kurtosis with applications

Biometrika 57 519ndash530Mardia K V Kent J T amp Bibby J M (1979) Multivariate analysis New York Academic PressMaronna R A (1976) Robust M-estimators of multivariate location and scatter Annals of

Statistics 4 51ndash67Mooijaart A ampBentler P M (1991) Robustness of normal theory statistics in structural equation

models Statistica Neerlandica 45 159ndash171Neumann C S (1994) Structural equation modeling of symptoms of alcoholism and

psychopathology Doctoral dissertation University of Kansas LawrenceSatorra A amp Bentler P M (1988) Scaling corrections for chi-square statistics in covariance

structure analysis In 1988 Proceedings of Business and Economics Sections (pp 308ndash313)Alexandria VA American Statistical Association

Satorra A amp Bentler P M (1990) Model conditions for asymptotic robustness in the analysis oflinear relations Computational Statistics amp Data Analysis 10 235ndash249

Satorra A ampSaris W E (1985) Power of the likelihood ratio test in covariance structure analysisPsychometrika 50 83ndash90

Satorra A Saris W E amp de Pijper W M (1991) Acomparison of several approximations to thepower function of the likelihood ratio test in covariance structure analysis StatisticaNeerlandica 45 173ndash185

Shapiro A (1987) Robustness properties of the MDF analysis of moment structures SouthAfrican Statistical Journal 21 39ndash62

Steiger J H amp Lind J M (1980) Statistically based tests for the number of common factorsPaper presented at the annual meeting of the Psychometric Society Iowa City

Steiger J H Shapiro A amp Browne M W (1985) On the multivariate asymptotic distribution ofsequential chi-square statistics Psychometrika 50 253ndash264

Bootstrap approach to inference and power analysis 109

Tanaka Y Watadani S amp Moon S H (1991) Inuence in covariance structure analysis with anapplication to conrmatory factor analysis Communications in Statistics Theory andMethods 20 3805ndash3821

Tyler D E (1983) Robustness and efciency properties of scatter matrices Biometrika 70411ndash420

Wakaki H Eguchi S amp Fujikoshi Y (1990) A class of tests for a general covariance structureJournal of Multivariate Analysis 32 313ndash325

Yuan K-H amp Bentler P M (1997) Mean and covariance structure analysis Theoretical andpractical improvements Journal of the American Statistical Association 92 767ndash774

Yuan K-H amp Bentler P M (1998) Structural equation modeling with robust covariancesIn A E Raftery (ed) Sociological methodology 1998 (pp 363ndash396) Boston BlackwellPublishers

Yuan K-H amp Bentler P M (1999) On normal theory and associated test statistics in covariancestructure analysis under two classes of nonnormal distributions Statistica Sinica 9 831ndash853

Yuan K-H Chan W amp Bentler P M (2000) Robust transformation with applications tostructural equation modelling British Journal of Mathematical and Statistical Psychology53 31ndash50

Yuan K-H Bentler P M amp Chan W (2001) Structural equation modeling with heavy taileddistributions Manuscript submitted for publication

Yuan K-H amp Hayashi K (2001) On using an empirical Bayes covariance matrix in bootstrapapproach to covariance structure analysis Manuscript submitted for publication

Yung Y F amp Bentler P M (1994) Bootstrap-corrected ADF test statistics in covariance structureanalysis British Journal of Mathematical and Statistical Psychology 47 63ndash84

Yung Y F amp Bentler P M (1996) Bootstrapping techniques in analysis of mean and covariancestructures In G A Marcoulides and R E Schumacker (Eds) Advanced structural equationmodeling Techniques and issues (pp 195ndash226) Hillsdale NJ Lawrence Erlbaum

Zhang J amp Boos D D (1992) Bootstrap critical values for testing homogeneity of covariancematrices Journal of the American Statistical Association 87 425ndash429

Received 9 November 2000 revised version received 22 March 2002

110 Ke-Hai Yuan and Kentaro Hayashi

Page 12: Bootstrap approach to inference and power analysis based on three test statistics for covariance structure models

Let S x be the sample covariance matrix of x i and v be the corresponding maximumlikelihood estimate of v The values of h in the solution for Sc in the form ofS

h= (1 plusmn h)S(v) + hS x corresponding to e = 05 08 and 10 are respectively

0421 61 0669 81 and 0830 83 Applying the bootstrap procedure outlined in Section24 to the samples x i x (05)

i and x (25)i the estimates ca and p-values for each of the

samples are reported in Table 5 The results indicate that there are substantialdifferences between the traditional p-values px 2 and the bootstrap p-values pBespecially when RMSEA = 10 for the downweighted samples It is obvious that TMLwhen RMSEA = 10 has a much shorter right tail than that of x 2

24(dc) This fact is alsoreected in the corresponding ca s Table 5 once again illustrates the fact that given aclose t the behaviour of TML cannot be described by a non-central chi-square unlessthe corresponding TML under H0 in (1) has a heavy right tail As when testing for exactt using a chi-square table to judge the signicance of a statistic in testing model (17) forclose t is misleading

Example 2 Neumann (1994) presented an alcohol and psychological symptomdata set consisting of 10 variables and 335 cases The two variables in x = (x 1 x 2) cent arerespectively family history of psychopathology and family history of alcoholism whichare indicators for a latent construct of family history The eight variables iny = (y1 y8) cent are respectively the age of rst problem with alcohol age of rstdetoxication from alcohol alcohol severity score alcohol use inventory SCL-90psychological inventory the sum of the Minnesota Multiphasic Personality Inventoryscores the lowest level of psychosocial functioning during the past year and the highestlevel of psychosocial functioning during the past year With two indicators for eachlatent construct these eight variables respectively measure age of onset alcoholsymptoms psychopathology symptoms and global functioning Neumannrsquos (1994)theoretical model for this data set is

x = Lx y + d y = L y h + e (18a)

h = B h + Gy + z (18b)

where

Lx =10

l 1

Aacute Ly =

1 l 2 0 0 0 0 0 0

0 0 1 l 3 0 0 0 0

0 0 0 0 1 l 4 0 0

0 0 0 0 0 0 1 l 5

0

BBBBB

1

CCCCCA

cent

(18c)

B =

0 0 0 0b 21 0 0 0

b 31 b 32 0 00 b42 b43 0

0

BBB

1

CCCA G =

g11

0

00

0

BBB

1

CCCA F = Var(y) (18d )

and e d and z are vectors of errors whose elements are all uncorrelated The modeldegrees of freedom are 29

With Mardiarsquos multivariate kurtosis equal to 1476 the data set may come from adistribution with heavy tails Abootstrap procedure is more appropriate after the heavytails are properly downweighted Actually Yuan et al (2001) found that the samplex (10)

i leads to the most efcient parameter estimates in (18) among various proceduresHere we apply the three statistics to the two samples x i and x (10)

i Our purpose is to

104 Ke-Hai Yuan and Kentaro Hayashi

explore the pivotal property of TML TSB and TB on each sample After noticing that alarge portion of the QQ plot for TML applying to x (10)

i is below the x = y line ouranalysis also includes the sample x (05)

i The QQplots of the three statistics applied to x i indicate that none of them is nearly

pivotal TML and TSB are for the most part above the x = y line however both their lefttails are slightlybelow the x = y line This implies that some bootstrap samples t model(18) extremely well a phenomenon which can be caused by too many data points nearthe centre of the distribution Adownweighting procedure only affects data points thatcause a test statistic to possess a heavy right tail It is not clear to us how to deal with asample when a test statistic has a light left tail The QQplots of TML and TSB on x (05)

i andx (10)

i suggest that their heavier right tails on x i are under control However the right tailof TB is still quite heavy when compared to x 2

29 The downweighting transformationH(10) not only just controls the right tail of TML but also makes most of the QQplot fallbelow the x = y line Visually inspecting the QQ plots of TML on x i x (05)

i and x (10)i

suggests that a downweighting transformation by H(r) with 0 lt r lt 05 may lead toa better procedure for analysing the alcohol and psychological symptom data setbased on TML Since any procedure is only an approximation to the real world andTML is nearly pivotal when applying to x (05)

i we recommend the analysis using TMLon x (05)

i for this data set This leads to TML = 4243 with a bootstrap p-value of 040implying that the model (18) marginally ts the alcohol and psychological symptomdata set

Both the raw data sets in Examples 1 and 2 have signicant multivariate kurtoses andthe downweighting transformation (15) achieves approximately pivotal behaviour ofthe statistics We maywonder how these test statistics behave when applied to a data setthat does not have a signicant multivariate kurtosis This is demonstrated in thefollowing example

Example 3 Table 121 of Mardia Kent and Bibby (1979) contains test scores ofn = 88 students on p = 5 topics mechanics vectors algebra analysis and statisticsThe rst two topics were tested with closed-book exams and the last three withopen-book exams Since these two examination methods may tap different abilities atwo-factor model as in (17a) with

L =10 l 21 0 0 00 0 10 l 42 l 52

sup3 acutecent F =

f11 f12

f 21 f 22

sup3 acute(19)

was proposed and conrmed by Tanaka Watadani and Moon (1991) Mardiarsquos multi-variate kurtosis for this data set equal to 0057 is not signicant It would be interestingto see how TML TB and TSB perform on this data set

QQ plots for the three statistics applied to x i indicate that all of them have heavierright tails than that of x 2

4 The QQ plot for TML applied to x (r)i continues to exhibit a

heavier right tail until r reaches 30 TSB and TB still possess quite heavy right tails evenwhen r = 30

This data set has been used as an example for inuence analysis Previous studiesindicate that case number 81 is the most inuential point (Lee amp Wang 1996 Fung ampKwan 1995) So it is enticing to apply the three statistics to the x i s without the 81stcase The QQplots of TML TSB and TB applied to the remaining 87 cases indicate that allthe three statistics still have heavier right tails than that of x 2

4 Actually compared tothose based on all 88 cases the right tails of the three statistics on the 87 cases are evenheavier For this data set our recommended analysis is to use TML applied to x (30)

i With

Bootstrap approach to inference and power analysis 105

TML = 189 and a bootstrap p-value of 71 model (19) is more than good enough inexplaining the relationship of the ve variables

6 Non-convergence with Bootstrap ReplicationsNon-convergence issues exist with resampling and caution is needed for bootstrapinference (Ichikawa amp Konishi 1995) This generally happens when the sample size isnot large enough and especially when a model structure is wrong This issue wasdiscussed in Yuan and Bentler (1997) with Monte Carlo studies on a covariancestructure model Here we propose a reasonable way of handling non-convergencewith bootstrap replications

With an iterative algorithm like Newtonrsquos convergence is generally dened ask Dv ( j ) k lt e where e is a small number and Dv ( j ) = v ( j ) plusmn v ( j plusmn 1) with v ( j ) being thej th step solution Let s = vech(S) be the factor formed by stacking the non-duplicatedelements of S and let its sample counterpart be s = vech(S) Then Dv ( j + 1) =

( Ccedils centj Wj Ccedilsj)plusmn 1 Ccedils centj Wj (s plusmn sj) where Ccedilsj = [ds(v)dv | v ( j )] sj = s(v ( j )) and Wj is the

corresponding weight matrix evaluated at v ( j ) So Dv ( j ) is proportional toS plusmn S(v ( j plusmn 1)) and it is impossible for k Dv ( j ) k to be smaller than e if S(v) is far fromE(S) Although a model is correct in a bootstrap population F0 some bootstrapreplications may still lie far away from the model especially when sampling errors arelarge for small samples For these samples even if we can obtain a solution using somenon-iterative or direct search method the corresponding statistic will be signicantBased on this fact we should distinguish two kinds of non-convergence in bootstrap orgeneral Monte Carlo studies The rst is where a sample contains enough distinct pointsand still cannot reach convergence with a model which should be treated as asignicant replication or a lsquobad modelrsquo The second is where a sample does not containenough distinct observations to t a model For obtaining a TML this number is p + 1 thenumber is p + 1 for obtaining a TB Although a TSB can be obtained once a TML isavailable we need to have a positive denite sample covariance matrix S y ofy i = vech[(x plusmn x )(x plusmn x ) cent ] in order for TSB to make sense (Bentler amp Yuan 1999)and the minimum number of distinct data points for S y to be positive denite is p + 1We should treat the second case as a bad sample and ignore it in bootstrap replications

In practice instead of estimating ca one generally reports the p-value of a statistic TIf all B bootstrap replications result in convergent solutions

pB =B0 + 1 plusmn M

B0 + 1

where M = fTb | Tb gt Tg When non-convergences occur B0 should be dened as thenumber of converged samples plus the number of signicant samples due to a badmodel M should be dened as the number of non-converged samples due to the badmodel plus the number of converged samples that result in Tb gt T With this modi-cation there is no problem with hypothesis testing A similar modication applies toformula (8) for power evaluation With ca = T(B0 (1 plusmn a)) in determining the numeratorin (8) one needs at least B0(1 plusmn a) converged samples when sampling from F0 The proposed procedure fails to generate a power estimate when the number ofconvergences under H0 is below B0(1 plusmn a)

Let l 1(S x ) $ l 2(S x ) $ $ l p(S x ) be the eigenvalues of the sample covariancematrix S x and g1(S y ) $ g 2(S y ) $ $ gp (S y ) be the eigenvalues of the sample

106 Ke-Hai Yuan and Kentaro Hayashi

covariance matrix S y of y i = vech[(x plusmn x )(x plusmn x ) cent ] In computing the three examplesour criteria for bad samples are l p(S x )l 1(S x ) 10plusmn 20 for obtaining TML andgp (S y )g1(S y ) 10plusmn 20 for obtaining TSB and TB If these criteria are satised and themodel still cannot reach convergence (k Dv k lt 10plusmn 5) in 100 iterations we treat thecorresponding statistic as innity All non-convergences with the three examples weredue to signicant replications these are reported in Table 6 Table 6(a) implies that for agiven sample the further a model is from H0 the more often non-convergences occurThe table also suggests that with a given model the heavier the tails of a data set themore often one obtains non-convergences

7 Discussion and conclusionsExisting conclusions regarding the three commonly used statistics TML TSB and TB arebased on asymptotics and Monte Carlo studies The properties inherent in either ofthese approaches may not be enjoyed by these statistics in practice because of either anite sample or an unknown sampling distribution By applying them to a specic data

Bootstrap approach to inference and power analysis 107

Table 6 Non-convergences due to signicant samples

(a) Example 1

Sample x i Sample x (05)i Sample x (25)

i

Statistic H0 Ha1 Ha2 H0 Ha1 Ha2 H0 Ha1 Ha2

TML 0 0 2 0 0 1 0 0 1TSB 0 0 2 0 0 1 0 0 1TB 0 10 21 0 10 27 0 7 20

(b) Example 2

Sample x i Sample x (05)i Sample x (10)

iStatistic H0 H0 H0

TML 9 1 1TSB 9 1 1TB 1 0 0

(c) Example 3

Sample x i Sample x (30)i Sample x i

(81st case removed)Statistic H0 H0 H0

TML 1 0 0TSB 1 0 0TB 0 0 0

set through resampling properties of the three statistics can be visually examined bymeans of QQplots When a data set possesses heavy tails TML will inherit these tails byhaving a heavy right tail Actually all three statistics need the sampling distribution tohave nite fourth-order moments With possible violation of this assumption bypractical data we propose applying a bootstrap procedure to a transformed samplex (r)

i through downweighting The combination of bootstrapping and downweightingnot only offers a theoretical justication for applying the bootstrap to a data set withheavy tails but also provides a quite exible tool for exploring the properties of each ofthe three statistics Even if inference is based on referring a statistic to a chi-squaredistribution one will obtain a more accurate model evaluation by applying TML to atransformed sample x (r)

i Asymptotics justify the three statistics from different perspectives and TB is

asymptotically distribution-free Previous Monte Carlo studies mainly support TSBHowever with proper downweighting TML is generally the one that is best describedby a chi-square distribution However the conclusion that TML is the best statistic forbootstrap inference has to be preliminary Future studies may nd TB or TSB moreappropriate for other data sets We recommend exploring the different procedures for agiven data set as illustrated in Section 5

AcknowledgementsThe authors would like to thank Professor Peter M Bentler and two anonymous referees whoseconstructive comments lead to an improved version of this paper Correspondence concerningthis article should be addressed to Ke-Hai Yuan (kyuanndedu)

ReferencesAmemiya Y amp Anderson T W (1990) Asymptotic chi-square tests for a large class of factor

analysis models Annals of Statistics 18 1453ndash1463Barndorff-Nielsen O E amp Cox D R (1984) Bartlett adjustments to the likelihood ratio statistic

and the distribution of the maximum likelihood estimator Journal of the Royal StatisticalSociety B 46 483ndash495

Bentler P M (1995) EQS Structural equations program manual Encino CA MultivariateSoftware

Bentler P M amp Yuan K-H (1999) Structural equation modeling with small samples Teststatistics Multivariate Behavioral Research 34 181ndash197

Beran R (1986) Simulated power functions Annals of Statistics 14 151ndash173Beran R (1988) Prepivoting test statistics Abootstrap view of asymptotic renements Journal

of the American Statistical Association 83 687ndash697Beran R amp Srivastava M S (1985) Bootstrap tests and condence regions for functions of a

covariance matrix Annals of Statistics 13 95ndash115Bollen K A amp Stine R (1993) Bootstrapping goodness of t measures in structural equation

models In K A BOllen and J S Long (Eds) Testing structural equation models(pp 111ndash135) Newbury Park CA Sage

Browne M W (1984) Asymptotic distribution-free methods for the analysis of covariancestructures British Journal of Mathematical and Statistical Psychology 37 62ndash83

Browne M W amp Shapiro A (1988) Robustness of normal theory methods in the analysis oflinear latent variate models British Journal of Mathematical and Statistical Psychology 41193ndash208

108 Ke-Hai Yuan and Kentaro Hayashi

Cohen J (1988) Statistical power analysis for the behavioral sciences (2nd ed) Hillsdale NJErlbaum

Davison A C amp Hinkley D V (1997) Bootstrap methods and their application CambridgeCambridge University Press

Efron B ampTibshirani R J (1993) An introduction to the bootstrap New York Chapman ampHallFouladi R T (1998) Covariance structure analysis techniques under conditions of multi-

variate normality and nonnormality Modied and bootstrap based test statistics Paperpresented at the annual meeting of the American Educational Research Association San DiegoCA

Fouladi R T (2000) Performance of modied test statistics in covariance and correlationstructure analysis under conditions of multivariate nonnormality Structural EquationModeling 7 356ndash410

Fung W K ampKwan C W (1995) Sensitivityanalysis in factor analysis Difference between usingcovariance and correlation matrices Psychometrika 60 607ndash614

Hall P (1992) The bootstrap and Edgeworth expansion New York Springer-VerlagHolzinger K J amp Swineford F (1939) A study in factor analysis The stability of a bi-factor

solution University of Chicago Press Supplementary Educational Monographs No 48Hu L T Bentler P M amp Kano Y (1992) Can test statistics in covariance structure analysis be

trusted Psychological Bulletin 112 351ndash362Huber P J (1981) Robust statistics New York WileyIchikawa M amp Konishi S (1995) Application of the bootstrap methods in factor analysis

Psychometrika 60 77ndash93Lawley D N amp Maxwell A E (1971) Factor analysis as a statistical method (2nd ed) New

York American ElsevierLee S Y ampWang S J (1996) Sensitivityanalysis of structural equation models Psychometrika

61 93ndash108MacCallum R C Browne M W ampSugawara H M (1996) Power analysis and determination of

sample size for covariance structure modeling Psychological Methods 1 130ndash149Mardia K V (1970) Measure of multivariate skewness and kurtosis with applications

Biometrika 57 519ndash530Mardia K V Kent J T amp Bibby J M (1979) Multivariate analysis New York Academic PressMaronna R A (1976) Robust M-estimators of multivariate location and scatter Annals of

Statistics 4 51ndash67Mooijaart A ampBentler P M (1991) Robustness of normal theory statistics in structural equation

models Statistica Neerlandica 45 159ndash171Neumann C S (1994) Structural equation modeling of symptoms of alcoholism and

psychopathology Doctoral dissertation University of Kansas LawrenceSatorra A amp Bentler P M (1988) Scaling corrections for chi-square statistics in covariance

structure analysis In 1988 Proceedings of Business and Economics Sections (pp 308ndash313)Alexandria VA American Statistical Association

Satorra A amp Bentler P M (1990) Model conditions for asymptotic robustness in the analysis oflinear relations Computational Statistics amp Data Analysis 10 235ndash249

Satorra A ampSaris W E (1985) Power of the likelihood ratio test in covariance structure analysisPsychometrika 50 83ndash90

Satorra A Saris W E amp de Pijper W M (1991) Acomparison of several approximations to thepower function of the likelihood ratio test in covariance structure analysis StatisticaNeerlandica 45 173ndash185

Shapiro A (1987) Robustness properties of the MDF analysis of moment structures SouthAfrican Statistical Journal 21 39ndash62

Steiger J H amp Lind J M (1980) Statistically based tests for the number of common factorsPaper presented at the annual meeting of the Psychometric Society Iowa City

Steiger J H Shapiro A amp Browne M W (1985) On the multivariate asymptotic distribution ofsequential chi-square statistics Psychometrika 50 253ndash264

Bootstrap approach to inference and power analysis 109

Tanaka Y Watadani S amp Moon S H (1991) Inuence in covariance structure analysis with anapplication to conrmatory factor analysis Communications in Statistics Theory andMethods 20 3805ndash3821

Tyler D E (1983) Robustness and efciency properties of scatter matrices Biometrika 70411ndash420

Wakaki H Eguchi S amp Fujikoshi Y (1990) A class of tests for a general covariance structureJournal of Multivariate Analysis 32 313ndash325

Yuan K-H amp Bentler P M (1997) Mean and covariance structure analysis Theoretical andpractical improvements Journal of the American Statistical Association 92 767ndash774

Yuan K-H amp Bentler P M (1998) Structural equation modeling with robust covariancesIn A E Raftery (ed) Sociological methodology 1998 (pp 363ndash396) Boston BlackwellPublishers

Yuan K-H amp Bentler P M (1999) On normal theory and associated test statistics in covariancestructure analysis under two classes of nonnormal distributions Statistica Sinica 9 831ndash853

Yuan K-H Chan W amp Bentler P M (2000) Robust transformation with applications tostructural equation modelling British Journal of Mathematical and Statistical Psychology53 31ndash50

Yuan K-H Bentler P M amp Chan W (2001) Structural equation modeling with heavy taileddistributions Manuscript submitted for publication

Yuan K-H amp Hayashi K (2001) On using an empirical Bayes covariance matrix in bootstrapapproach to covariance structure analysis Manuscript submitted for publication

Yung Y F amp Bentler P M (1994) Bootstrap-corrected ADF test statistics in covariance structureanalysis British Journal of Mathematical and Statistical Psychology 47 63ndash84

Yung Y F amp Bentler P M (1996) Bootstrapping techniques in analysis of mean and covariancestructures In G A Marcoulides and R E Schumacker (Eds) Advanced structural equationmodeling Techniques and issues (pp 195ndash226) Hillsdale NJ Lawrence Erlbaum

Zhang J amp Boos D D (1992) Bootstrap critical values for testing homogeneity of covariancematrices Journal of the American Statistical Association 87 425ndash429

Received 9 November 2000 revised version received 22 March 2002

110 Ke-Hai Yuan and Kentaro Hayashi

Page 13: Bootstrap approach to inference and power analysis based on three test statistics for covariance structure models

explore the pivotal property of TML TSB and TB on each sample After noticing that alarge portion of the QQ plot for TML applying to x (10)

i is below the x = y line ouranalysis also includes the sample x (05)

i The QQplots of the three statistics applied to x i indicate that none of them is nearly

pivotal TML and TSB are for the most part above the x = y line however both their lefttails are slightlybelow the x = y line This implies that some bootstrap samples t model(18) extremely well a phenomenon which can be caused by too many data points nearthe centre of the distribution Adownweighting procedure only affects data points thatcause a test statistic to possess a heavy right tail It is not clear to us how to deal with asample when a test statistic has a light left tail The QQplots of TML and TSB on x (05)

i andx (10)

i suggest that their heavier right tails on x i are under control However the right tailof TB is still quite heavy when compared to x 2

29 The downweighting transformationH(10) not only just controls the right tail of TML but also makes most of the QQplot fallbelow the x = y line Visually inspecting the QQ plots of TML on x i x (05)

i and x (10)i

suggests that a downweighting transformation by H(r) with 0 lt r lt 05 may lead toa better procedure for analysing the alcohol and psychological symptom data setbased on TML Since any procedure is only an approximation to the real world andTML is nearly pivotal when applying to x (05)

i we recommend the analysis using TMLon x (05)

i for this data set This leads to TML = 4243 with a bootstrap p-value of 040implying that the model (18) marginally ts the alcohol and psychological symptomdata set

Both the raw data sets in Examples 1 and 2 have signicant multivariate kurtoses andthe downweighting transformation (15) achieves approximately pivotal behaviour ofthe statistics We maywonder how these test statistics behave when applied to a data setthat does not have a signicant multivariate kurtosis This is demonstrated in thefollowing example

Example 3 Table 121 of Mardia Kent and Bibby (1979) contains test scores ofn = 88 students on p = 5 topics mechanics vectors algebra analysis and statisticsThe rst two topics were tested with closed-book exams and the last three withopen-book exams Since these two examination methods may tap different abilities atwo-factor model as in (17a) with

L =10 l 21 0 0 00 0 10 l 42 l 52

sup3 acutecent F =

f11 f12

f 21 f 22

sup3 acute(19)

was proposed and conrmed by Tanaka Watadani and Moon (1991) Mardiarsquos multi-variate kurtosis for this data set equal to 0057 is not signicant It would be interestingto see how TML TB and TSB perform on this data set

QQ plots for the three statistics applied to x i indicate that all of them have heavierright tails than that of x 2

4 The QQ plot for TML applied to x (r)i continues to exhibit a

heavier right tail until r reaches 30 TSB and TB still possess quite heavy right tails evenwhen r = 30

This data set has been used as an example for inuence analysis Previous studiesindicate that case number 81 is the most inuential point (Lee amp Wang 1996 Fung ampKwan 1995) So it is enticing to apply the three statistics to the x i s without the 81stcase The QQplots of TML TSB and TB applied to the remaining 87 cases indicate that allthe three statistics still have heavier right tails than that of x 2

4 Actually compared tothose based on all 88 cases the right tails of the three statistics on the 87 cases are evenheavier For this data set our recommended analysis is to use TML applied to x (30)

i With

Bootstrap approach to inference and power analysis 105

TML = 189 and a bootstrap p-value of 71 model (19) is more than good enough inexplaining the relationship of the ve variables

6 Non-convergence with Bootstrap ReplicationsNon-convergence issues exist with resampling and caution is needed for bootstrapinference (Ichikawa amp Konishi 1995) This generally happens when the sample size isnot large enough and especially when a model structure is wrong This issue wasdiscussed in Yuan and Bentler (1997) with Monte Carlo studies on a covariancestructure model Here we propose a reasonable way of handling non-convergencewith bootstrap replications

With an iterative algorithm like Newtonrsquos convergence is generally dened ask Dv ( j ) k lt e where e is a small number and Dv ( j ) = v ( j ) plusmn v ( j plusmn 1) with v ( j ) being thej th step solution Let s = vech(S) be the factor formed by stacking the non-duplicatedelements of S and let its sample counterpart be s = vech(S) Then Dv ( j + 1) =

( Ccedils centj Wj Ccedilsj)plusmn 1 Ccedils centj Wj (s plusmn sj) where Ccedilsj = [ds(v)dv | v ( j )] sj = s(v ( j )) and Wj is the

corresponding weight matrix evaluated at v ( j ) So Dv ( j ) is proportional toS plusmn S(v ( j plusmn 1)) and it is impossible for k Dv ( j ) k to be smaller than e if S(v) is far fromE(S) Although a model is correct in a bootstrap population F0 some bootstrapreplications may still lie far away from the model especially when sampling errors arelarge for small samples For these samples even if we can obtain a solution using somenon-iterative or direct search method the corresponding statistic will be signicantBased on this fact we should distinguish two kinds of non-convergence in bootstrap orgeneral Monte Carlo studies The rst is where a sample contains enough distinct pointsand still cannot reach convergence with a model which should be treated as asignicant replication or a lsquobad modelrsquo The second is where a sample does not containenough distinct observations to t a model For obtaining a TML this number is p + 1 thenumber is p + 1 for obtaining a TB Although a TSB can be obtained once a TML isavailable we need to have a positive denite sample covariance matrix S y ofy i = vech[(x plusmn x )(x plusmn x ) cent ] in order for TSB to make sense (Bentler amp Yuan 1999)and the minimum number of distinct data points for S y to be positive denite is p + 1We should treat the second case as a bad sample and ignore it in bootstrap replications

In practice instead of estimating ca one generally reports the p-value of a statistic TIf all B bootstrap replications result in convergent solutions

pB =B0 + 1 plusmn M

B0 + 1

where M = fTb | Tb gt Tg When non-convergences occur B0 should be dened as thenumber of converged samples plus the number of signicant samples due to a badmodel M should be dened as the number of non-converged samples due to the badmodel plus the number of converged samples that result in Tb gt T With this modi-cation there is no problem with hypothesis testing A similar modication applies toformula (8) for power evaluation With ca = T(B0 (1 plusmn a)) in determining the numeratorin (8) one needs at least B0(1 plusmn a) converged samples when sampling from F0 The proposed procedure fails to generate a power estimate when the number ofconvergences under H0 is below B0(1 plusmn a)

Let l 1(S x ) $ l 2(S x ) $ $ l p(S x ) be the eigenvalues of the sample covariancematrix S x and g1(S y ) $ g 2(S y ) $ $ gp (S y ) be the eigenvalues of the sample

106 Ke-Hai Yuan and Kentaro Hayashi

covariance matrix S y of y i = vech[(x plusmn x )(x plusmn x ) cent ] In computing the three examplesour criteria for bad samples are l p(S x )l 1(S x ) 10plusmn 20 for obtaining TML andgp (S y )g1(S y ) 10plusmn 20 for obtaining TSB and TB If these criteria are satised and themodel still cannot reach convergence (k Dv k lt 10plusmn 5) in 100 iterations we treat thecorresponding statistic as innity All non-convergences with the three examples weredue to signicant replications these are reported in Table 6 Table 6(a) implies that for agiven sample the further a model is from H0 the more often non-convergences occurThe table also suggests that with a given model the heavier the tails of a data set themore often one obtains non-convergences

7 Discussion and conclusionsExisting conclusions regarding the three commonly used statistics TML TSB and TB arebased on asymptotics and Monte Carlo studies The properties inherent in either ofthese approaches may not be enjoyed by these statistics in practice because of either anite sample or an unknown sampling distribution By applying them to a specic data

Bootstrap approach to inference and power analysis 107

Table 6 Non-convergences due to signicant samples

(a) Example 1

Sample x i Sample x (05)i Sample x (25)

i

Statistic H0 Ha1 Ha2 H0 Ha1 Ha2 H0 Ha1 Ha2

TML 0 0 2 0 0 1 0 0 1TSB 0 0 2 0 0 1 0 0 1TB 0 10 21 0 10 27 0 7 20

(b) Example 2

Sample x i Sample x (05)i Sample x (10)

iStatistic H0 H0 H0

TML 9 1 1TSB 9 1 1TB 1 0 0

(c) Example 3

Sample x i Sample x (30)i Sample x i

(81st case removed)Statistic H0 H0 H0

TML 1 0 0TSB 1 0 0TB 0 0 0

set through resampling properties of the three statistics can be visually examined bymeans of QQplots When a data set possesses heavy tails TML will inherit these tails byhaving a heavy right tail Actually all three statistics need the sampling distribution tohave nite fourth-order moments With possible violation of this assumption bypractical data we propose applying a bootstrap procedure to a transformed samplex (r)

i through downweighting The combination of bootstrapping and downweightingnot only offers a theoretical justication for applying the bootstrap to a data set withheavy tails but also provides a quite exible tool for exploring the properties of each ofthe three statistics Even if inference is based on referring a statistic to a chi-squaredistribution one will obtain a more accurate model evaluation by applying TML to atransformed sample x (r)

i Asymptotics justify the three statistics from different perspectives and TB is

asymptotically distribution-free Previous Monte Carlo studies mainly support TSBHowever with proper downweighting TML is generally the one that is best describedby a chi-square distribution However the conclusion that TML is the best statistic forbootstrap inference has to be preliminary Future studies may nd TB or TSB moreappropriate for other data sets We recommend exploring the different procedures for agiven data set as illustrated in Section 5

AcknowledgementsThe authors would like to thank Professor Peter M Bentler and two anonymous referees whoseconstructive comments lead to an improved version of this paper Correspondence concerningthis article should be addressed to Ke-Hai Yuan (kyuanndedu)

ReferencesAmemiya Y amp Anderson T W (1990) Asymptotic chi-square tests for a large class of factor

analysis models Annals of Statistics 18 1453ndash1463Barndorff-Nielsen O E amp Cox D R (1984) Bartlett adjustments to the likelihood ratio statistic

and the distribution of the maximum likelihood estimator Journal of the Royal StatisticalSociety B 46 483ndash495

Bentler P M (1995) EQS Structural equations program manual Encino CA MultivariateSoftware

Bentler P M amp Yuan K-H (1999) Structural equation modeling with small samples Teststatistics Multivariate Behavioral Research 34 181ndash197

Beran R (1986) Simulated power functions Annals of Statistics 14 151ndash173Beran R (1988) Prepivoting test statistics Abootstrap view of asymptotic renements Journal

of the American Statistical Association 83 687ndash697Beran R amp Srivastava M S (1985) Bootstrap tests and condence regions for functions of a

covariance matrix Annals of Statistics 13 95ndash115Bollen K A amp Stine R (1993) Bootstrapping goodness of t measures in structural equation

models In K A BOllen and J S Long (Eds) Testing structural equation models(pp 111ndash135) Newbury Park CA Sage

Browne M W (1984) Asymptotic distribution-free methods for the analysis of covariancestructures British Journal of Mathematical and Statistical Psychology 37 62ndash83

Browne M W amp Shapiro A (1988) Robustness of normal theory methods in the analysis oflinear latent variate models British Journal of Mathematical and Statistical Psychology 41193ndash208

108 Ke-Hai Yuan and Kentaro Hayashi

Cohen J (1988) Statistical power analysis for the behavioral sciences (2nd ed) Hillsdale NJErlbaum

Davison A C amp Hinkley D V (1997) Bootstrap methods and their application CambridgeCambridge University Press

Efron B ampTibshirani R J (1993) An introduction to the bootstrap New York Chapman ampHallFouladi R T (1998) Covariance structure analysis techniques under conditions of multi-

variate normality and nonnormality Modied and bootstrap based test statistics Paperpresented at the annual meeting of the American Educational Research Association San DiegoCA

Fouladi R T (2000) Performance of modied test statistics in covariance and correlationstructure analysis under conditions of multivariate nonnormality Structural EquationModeling 7 356ndash410

Fung W K ampKwan C W (1995) Sensitivityanalysis in factor analysis Difference between usingcovariance and correlation matrices Psychometrika 60 607ndash614

Hall P (1992) The bootstrap and Edgeworth expansion New York Springer-VerlagHolzinger K J amp Swineford F (1939) A study in factor analysis The stability of a bi-factor

solution University of Chicago Press Supplementary Educational Monographs No 48Hu L T Bentler P M amp Kano Y (1992) Can test statistics in covariance structure analysis be

trusted Psychological Bulletin 112 351ndash362Huber P J (1981) Robust statistics New York WileyIchikawa M amp Konishi S (1995) Application of the bootstrap methods in factor analysis

Psychometrika 60 77ndash93Lawley D N amp Maxwell A E (1971) Factor analysis as a statistical method (2nd ed) New

York American ElsevierLee S Y ampWang S J (1996) Sensitivityanalysis of structural equation models Psychometrika

61 93ndash108MacCallum R C Browne M W ampSugawara H M (1996) Power analysis and determination of

sample size for covariance structure modeling Psychological Methods 1 130ndash149Mardia K V (1970) Measure of multivariate skewness and kurtosis with applications

Biometrika 57 519ndash530Mardia K V Kent J T amp Bibby J M (1979) Multivariate analysis New York Academic PressMaronna R A (1976) Robust M-estimators of multivariate location and scatter Annals of

Statistics 4 51ndash67Mooijaart A ampBentler P M (1991) Robustness of normal theory statistics in structural equation

models Statistica Neerlandica 45 159ndash171Neumann C S (1994) Structural equation modeling of symptoms of alcoholism and

psychopathology Doctoral dissertation University of Kansas LawrenceSatorra A amp Bentler P M (1988) Scaling corrections for chi-square statistics in covariance

structure analysis In 1988 Proceedings of Business and Economics Sections (pp 308ndash313)Alexandria VA American Statistical Association

Satorra A amp Bentler P M (1990) Model conditions for asymptotic robustness in the analysis oflinear relations Computational Statistics amp Data Analysis 10 235ndash249

Satorra A ampSaris W E (1985) Power of the likelihood ratio test in covariance structure analysisPsychometrika 50 83ndash90

Satorra A Saris W E amp de Pijper W M (1991) Acomparison of several approximations to thepower function of the likelihood ratio test in covariance structure analysis StatisticaNeerlandica 45 173ndash185

Shapiro A (1987) Robustness properties of the MDF analysis of moment structures SouthAfrican Statistical Journal 21 39ndash62

Steiger J H amp Lind J M (1980) Statistically based tests for the number of common factorsPaper presented at the annual meeting of the Psychometric Society Iowa City

Steiger J H Shapiro A amp Browne M W (1985) On the multivariate asymptotic distribution ofsequential chi-square statistics Psychometrika 50 253ndash264

Bootstrap approach to inference and power analysis 109

Tanaka Y Watadani S amp Moon S H (1991) Inuence in covariance structure analysis with anapplication to conrmatory factor analysis Communications in Statistics Theory andMethods 20 3805ndash3821

Tyler D E (1983) Robustness and efciency properties of scatter matrices Biometrika 70411ndash420

Wakaki H Eguchi S amp Fujikoshi Y (1990) A class of tests for a general covariance structureJournal of Multivariate Analysis 32 313ndash325

Yuan K-H amp Bentler P M (1997) Mean and covariance structure analysis Theoretical andpractical improvements Journal of the American Statistical Association 92 767ndash774

Yuan K-H amp Bentler P M (1998) Structural equation modeling with robust covariancesIn A E Raftery (ed) Sociological methodology 1998 (pp 363ndash396) Boston BlackwellPublishers

Yuan K-H amp Bentler P M (1999) On normal theory and associated test statistics in covariancestructure analysis under two classes of nonnormal distributions Statistica Sinica 9 831ndash853

Yuan K-H Chan W amp Bentler P M (2000) Robust transformation with applications tostructural equation modelling British Journal of Mathematical and Statistical Psychology53 31ndash50

Yuan K-H Bentler P M amp Chan W (2001) Structural equation modeling with heavy taileddistributions Manuscript submitted for publication

Yuan K-H amp Hayashi K (2001) On using an empirical Bayes covariance matrix in bootstrapapproach to covariance structure analysis Manuscript submitted for publication

Yung Y F amp Bentler P M (1994) Bootstrap-corrected ADF test statistics in covariance structureanalysis British Journal of Mathematical and Statistical Psychology 47 63ndash84

Yung Y F amp Bentler P M (1996) Bootstrapping techniques in analysis of mean and covariancestructures In G A Marcoulides and R E Schumacker (Eds) Advanced structural equationmodeling Techniques and issues (pp 195ndash226) Hillsdale NJ Lawrence Erlbaum

Zhang J amp Boos D D (1992) Bootstrap critical values for testing homogeneity of covariancematrices Journal of the American Statistical Association 87 425ndash429

Received 9 November 2000 revised version received 22 March 2002

110 Ke-Hai Yuan and Kentaro Hayashi

Page 14: Bootstrap approach to inference and power analysis based on three test statistics for covariance structure models

TML = 189 and a bootstrap p-value of 71 model (19) is more than good enough inexplaining the relationship of the ve variables

6 Non-convergence with Bootstrap ReplicationsNon-convergence issues exist with resampling and caution is needed for bootstrapinference (Ichikawa amp Konishi 1995) This generally happens when the sample size isnot large enough and especially when a model structure is wrong This issue wasdiscussed in Yuan and Bentler (1997) with Monte Carlo studies on a covariancestructure model Here we propose a reasonable way of handling non-convergencewith bootstrap replications

With an iterative algorithm like Newtonrsquos convergence is generally dened ask Dv ( j ) k lt e where e is a small number and Dv ( j ) = v ( j ) plusmn v ( j plusmn 1) with v ( j ) being thej th step solution Let s = vech(S) be the factor formed by stacking the non-duplicatedelements of S and let its sample counterpart be s = vech(S) Then Dv ( j + 1) =

( Ccedils centj Wj Ccedilsj)plusmn 1 Ccedils centj Wj (s plusmn sj) where Ccedilsj = [ds(v)dv | v ( j )] sj = s(v ( j )) and Wj is the

corresponding weight matrix evaluated at v ( j ) So Dv ( j ) is proportional toS plusmn S(v ( j plusmn 1)) and it is impossible for k Dv ( j ) k to be smaller than e if S(v) is far fromE(S) Although a model is correct in a bootstrap population F0 some bootstrapreplications may still lie far away from the model especially when sampling errors arelarge for small samples For these samples even if we can obtain a solution using somenon-iterative or direct search method the corresponding statistic will be signicantBased on this fact we should distinguish two kinds of non-convergence in bootstrap orgeneral Monte Carlo studies The rst is where a sample contains enough distinct pointsand still cannot reach convergence with a model which should be treated as asignicant replication or a lsquobad modelrsquo The second is where a sample does not containenough distinct observations to t a model For obtaining a TML this number is p + 1 thenumber is p + 1 for obtaining a TB Although a TSB can be obtained once a TML isavailable we need to have a positive denite sample covariance matrix S y ofy i = vech[(x plusmn x )(x plusmn x ) cent ] in order for TSB to make sense (Bentler amp Yuan 1999)and the minimum number of distinct data points for S y to be positive denite is p + 1We should treat the second case as a bad sample and ignore it in bootstrap replications

In practice instead of estimating ca one generally reports the p-value of a statistic TIf all B bootstrap replications result in convergent solutions

pB =B0 + 1 plusmn M

B0 + 1

where M = fTb | Tb gt Tg When non-convergences occur B0 should be dened as thenumber of converged samples plus the number of signicant samples due to a badmodel M should be dened as the number of non-converged samples due to the badmodel plus the number of converged samples that result in Tb gt T With this modi-cation there is no problem with hypothesis testing A similar modication applies toformula (8) for power evaluation With ca = T(B0 (1 plusmn a)) in determining the numeratorin (8) one needs at least B0(1 plusmn a) converged samples when sampling from F0 The proposed procedure fails to generate a power estimate when the number ofconvergences under H0 is below B0(1 plusmn a)

Let l 1(S x ) $ l 2(S x ) $ $ l p(S x ) be the eigenvalues of the sample covariancematrix S x and g1(S y ) $ g 2(S y ) $ $ gp (S y ) be the eigenvalues of the sample

106 Ke-Hai Yuan and Kentaro Hayashi

covariance matrix S y of y i = vech[(x plusmn x )(x plusmn x ) cent ] In computing the three examplesour criteria for bad samples are l p(S x )l 1(S x ) 10plusmn 20 for obtaining TML andgp (S y )g1(S y ) 10plusmn 20 for obtaining TSB and TB If these criteria are satised and themodel still cannot reach convergence (k Dv k lt 10plusmn 5) in 100 iterations we treat thecorresponding statistic as innity All non-convergences with the three examples weredue to signicant replications these are reported in Table 6 Table 6(a) implies that for agiven sample the further a model is from H0 the more often non-convergences occurThe table also suggests that with a given model the heavier the tails of a data set themore often one obtains non-convergences

7 Discussion and conclusionsExisting conclusions regarding the three commonly used statistics TML TSB and TB arebased on asymptotics and Monte Carlo studies The properties inherent in either ofthese approaches may not be enjoyed by these statistics in practice because of either anite sample or an unknown sampling distribution By applying them to a specic data

Bootstrap approach to inference and power analysis 107

Table 6 Non-convergences due to signicant samples

(a) Example 1

Sample x i Sample x (05)i Sample x (25)

i

Statistic H0 Ha1 Ha2 H0 Ha1 Ha2 H0 Ha1 Ha2

TML 0 0 2 0 0 1 0 0 1TSB 0 0 2 0 0 1 0 0 1TB 0 10 21 0 10 27 0 7 20

(b) Example 2

Sample x i Sample x (05)i Sample x (10)

iStatistic H0 H0 H0

TML 9 1 1TSB 9 1 1TB 1 0 0

(c) Example 3

Sample x i Sample x (30)i Sample x i

(81st case removed)Statistic H0 H0 H0

TML 1 0 0TSB 1 0 0TB 0 0 0

set through resampling properties of the three statistics can be visually examined bymeans of QQplots When a data set possesses heavy tails TML will inherit these tails byhaving a heavy right tail Actually all three statistics need the sampling distribution tohave nite fourth-order moments With possible violation of this assumption bypractical data we propose applying a bootstrap procedure to a transformed samplex (r)

i through downweighting The combination of bootstrapping and downweightingnot only offers a theoretical justication for applying the bootstrap to a data set withheavy tails but also provides a quite exible tool for exploring the properties of each ofthe three statistics Even if inference is based on referring a statistic to a chi-squaredistribution one will obtain a more accurate model evaluation by applying TML to atransformed sample x (r)

i Asymptotics justify the three statistics from different perspectives and TB is

asymptotically distribution-free Previous Monte Carlo studies mainly support TSBHowever with proper downweighting TML is generally the one that is best describedby a chi-square distribution However the conclusion that TML is the best statistic forbootstrap inference has to be preliminary Future studies may nd TB or TSB moreappropriate for other data sets We recommend exploring the different procedures for agiven data set as illustrated in Section 5

AcknowledgementsThe authors would like to thank Professor Peter M Bentler and two anonymous referees whoseconstructive comments lead to an improved version of this paper Correspondence concerningthis article should be addressed to Ke-Hai Yuan (kyuanndedu)

ReferencesAmemiya Y amp Anderson T W (1990) Asymptotic chi-square tests for a large class of factor

analysis models Annals of Statistics 18 1453ndash1463Barndorff-Nielsen O E amp Cox D R (1984) Bartlett adjustments to the likelihood ratio statistic

and the distribution of the maximum likelihood estimator Journal of the Royal StatisticalSociety B 46 483ndash495

Bentler P M (1995) EQS Structural equations program manual Encino CA MultivariateSoftware

Bentler P M amp Yuan K-H (1999) Structural equation modeling with small samples Teststatistics Multivariate Behavioral Research 34 181ndash197

Beran R (1986) Simulated power functions Annals of Statistics 14 151ndash173Beran R (1988) Prepivoting test statistics Abootstrap view of asymptotic renements Journal

of the American Statistical Association 83 687ndash697Beran R amp Srivastava M S (1985) Bootstrap tests and condence regions for functions of a

covariance matrix Annals of Statistics 13 95ndash115Bollen K A amp Stine R (1993) Bootstrapping goodness of t measures in structural equation

models In K A BOllen and J S Long (Eds) Testing structural equation models(pp 111ndash135) Newbury Park CA Sage

Browne M W (1984) Asymptotic distribution-free methods for the analysis of covariancestructures British Journal of Mathematical and Statistical Psychology 37 62ndash83

Browne M W amp Shapiro A (1988) Robustness of normal theory methods in the analysis oflinear latent variate models British Journal of Mathematical and Statistical Psychology 41193ndash208

108 Ke-Hai Yuan and Kentaro Hayashi

Cohen J (1988) Statistical power analysis for the behavioral sciences (2nd ed) Hillsdale NJErlbaum

Davison A C amp Hinkley D V (1997) Bootstrap methods and their application CambridgeCambridge University Press

Efron B ampTibshirani R J (1993) An introduction to the bootstrap New York Chapman ampHallFouladi R T (1998) Covariance structure analysis techniques under conditions of multi-

variate normality and nonnormality Modied and bootstrap based test statistics Paperpresented at the annual meeting of the American Educational Research Association San DiegoCA

Fouladi R T (2000) Performance of modied test statistics in covariance and correlationstructure analysis under conditions of multivariate nonnormality Structural EquationModeling 7 356ndash410

Fung W K ampKwan C W (1995) Sensitivityanalysis in factor analysis Difference between usingcovariance and correlation matrices Psychometrika 60 607ndash614

Hall P (1992) The bootstrap and Edgeworth expansion New York Springer-VerlagHolzinger K J amp Swineford F (1939) A study in factor analysis The stability of a bi-factor

solution University of Chicago Press Supplementary Educational Monographs No 48Hu L T Bentler P M amp Kano Y (1992) Can test statistics in covariance structure analysis be

trusted Psychological Bulletin 112 351ndash362Huber P J (1981) Robust statistics New York WileyIchikawa M amp Konishi S (1995) Application of the bootstrap methods in factor analysis

Psychometrika 60 77ndash93Lawley D N amp Maxwell A E (1971) Factor analysis as a statistical method (2nd ed) New

York American ElsevierLee S Y ampWang S J (1996) Sensitivityanalysis of structural equation models Psychometrika

61 93ndash108MacCallum R C Browne M W ampSugawara H M (1996) Power analysis and determination of

sample size for covariance structure modeling Psychological Methods 1 130ndash149Mardia K V (1970) Measure of multivariate skewness and kurtosis with applications

Biometrika 57 519ndash530Mardia K V Kent J T amp Bibby J M (1979) Multivariate analysis New York Academic PressMaronna R A (1976) Robust M-estimators of multivariate location and scatter Annals of

Statistics 4 51ndash67Mooijaart A ampBentler P M (1991) Robustness of normal theory statistics in structural equation

models Statistica Neerlandica 45 159ndash171Neumann C S (1994) Structural equation modeling of symptoms of alcoholism and

psychopathology Doctoral dissertation University of Kansas LawrenceSatorra A amp Bentler P M (1988) Scaling corrections for chi-square statistics in covariance

structure analysis In 1988 Proceedings of Business and Economics Sections (pp 308ndash313)Alexandria VA American Statistical Association

Satorra A amp Bentler P M (1990) Model conditions for asymptotic robustness in the analysis oflinear relations Computational Statistics amp Data Analysis 10 235ndash249

Satorra A ampSaris W E (1985) Power of the likelihood ratio test in covariance structure analysisPsychometrika 50 83ndash90

Satorra A Saris W E amp de Pijper W M (1991) Acomparison of several approximations to thepower function of the likelihood ratio test in covariance structure analysis StatisticaNeerlandica 45 173ndash185

Shapiro A (1987) Robustness properties of the MDF analysis of moment structures SouthAfrican Statistical Journal 21 39ndash62

Steiger J H amp Lind J M (1980) Statistically based tests for the number of common factorsPaper presented at the annual meeting of the Psychometric Society Iowa City

Steiger J H Shapiro A amp Browne M W (1985) On the multivariate asymptotic distribution ofsequential chi-square statistics Psychometrika 50 253ndash264

Bootstrap approach to inference and power analysis 109

Tanaka Y Watadani S amp Moon S H (1991) Inuence in covariance structure analysis with anapplication to conrmatory factor analysis Communications in Statistics Theory andMethods 20 3805ndash3821

Tyler D E (1983) Robustness and efciency properties of scatter matrices Biometrika 70411ndash420

Wakaki H Eguchi S amp Fujikoshi Y (1990) A class of tests for a general covariance structureJournal of Multivariate Analysis 32 313ndash325

Yuan K-H amp Bentler P M (1997) Mean and covariance structure analysis Theoretical andpractical improvements Journal of the American Statistical Association 92 767ndash774

Yuan K-H amp Bentler P M (1998) Structural equation modeling with robust covariancesIn A E Raftery (ed) Sociological methodology 1998 (pp 363ndash396) Boston BlackwellPublishers

Yuan K-H amp Bentler P M (1999) On normal theory and associated test statistics in covariancestructure analysis under two classes of nonnormal distributions Statistica Sinica 9 831ndash853

Yuan K-H Chan W amp Bentler P M (2000) Robust transformation with applications tostructural equation modelling British Journal of Mathematical and Statistical Psychology53 31ndash50

Yuan K-H Bentler P M amp Chan W (2001) Structural equation modeling with heavy taileddistributions Manuscript submitted for publication

Yuan K-H amp Hayashi K (2001) On using an empirical Bayes covariance matrix in bootstrapapproach to covariance structure analysis Manuscript submitted for publication

Yung Y F amp Bentler P M (1994) Bootstrap-corrected ADF test statistics in covariance structureanalysis British Journal of Mathematical and Statistical Psychology 47 63ndash84

Yung Y F amp Bentler P M (1996) Bootstrapping techniques in analysis of mean and covariancestructures In G A Marcoulides and R E Schumacker (Eds) Advanced structural equationmodeling Techniques and issues (pp 195ndash226) Hillsdale NJ Lawrence Erlbaum

Zhang J amp Boos D D (1992) Bootstrap critical values for testing homogeneity of covariancematrices Journal of the American Statistical Association 87 425ndash429

Received 9 November 2000 revised version received 22 March 2002

110 Ke-Hai Yuan and Kentaro Hayashi

Page 15: Bootstrap approach to inference and power analysis based on three test statistics for covariance structure models

covariance matrix S y of y i = vech[(x plusmn x )(x plusmn x ) cent ] In computing the three examplesour criteria for bad samples are l p(S x )l 1(S x ) 10plusmn 20 for obtaining TML andgp (S y )g1(S y ) 10plusmn 20 for obtaining TSB and TB If these criteria are satised and themodel still cannot reach convergence (k Dv k lt 10plusmn 5) in 100 iterations we treat thecorresponding statistic as innity All non-convergences with the three examples weredue to signicant replications these are reported in Table 6 Table 6(a) implies that for agiven sample the further a model is from H0 the more often non-convergences occurThe table also suggests that with a given model the heavier the tails of a data set themore often one obtains non-convergences

7 Discussion and conclusionsExisting conclusions regarding the three commonly used statistics TML TSB and TB arebased on asymptotics and Monte Carlo studies The properties inherent in either ofthese approaches may not be enjoyed by these statistics in practice because of either anite sample or an unknown sampling distribution By applying them to a specic data

Bootstrap approach to inference and power analysis 107

Table 6 Non-convergences due to signicant samples

(a) Example 1

Sample x i Sample x (05)i Sample x (25)

i

Statistic H0 Ha1 Ha2 H0 Ha1 Ha2 H0 Ha1 Ha2

TML 0 0 2 0 0 1 0 0 1TSB 0 0 2 0 0 1 0 0 1TB 0 10 21 0 10 27 0 7 20

(b) Example 2

Sample x i Sample x (05)i Sample x (10)

iStatistic H0 H0 H0

TML 9 1 1TSB 9 1 1TB 1 0 0

(c) Example 3

Sample x i Sample x (30)i Sample x i

(81st case removed)Statistic H0 H0 H0

TML 1 0 0TSB 1 0 0TB 0 0 0

set through resampling properties of the three statistics can be visually examined bymeans of QQplots When a data set possesses heavy tails TML will inherit these tails byhaving a heavy right tail Actually all three statistics need the sampling distribution tohave nite fourth-order moments With possible violation of this assumption bypractical data we propose applying a bootstrap procedure to a transformed samplex (r)

i through downweighting The combination of bootstrapping and downweightingnot only offers a theoretical justication for applying the bootstrap to a data set withheavy tails but also provides a quite exible tool for exploring the properties of each ofthe three statistics Even if inference is based on referring a statistic to a chi-squaredistribution one will obtain a more accurate model evaluation by applying TML to atransformed sample x (r)

i Asymptotics justify the three statistics from different perspectives and TB is

asymptotically distribution-free Previous Monte Carlo studies mainly support TSBHowever with proper downweighting TML is generally the one that is best describedby a chi-square distribution However the conclusion that TML is the best statistic forbootstrap inference has to be preliminary Future studies may nd TB or TSB moreappropriate for other data sets We recommend exploring the different procedures for agiven data set as illustrated in Section 5

AcknowledgementsThe authors would like to thank Professor Peter M Bentler and two anonymous referees whoseconstructive comments lead to an improved version of this paper Correspondence concerningthis article should be addressed to Ke-Hai Yuan (kyuanndedu)

ReferencesAmemiya Y amp Anderson T W (1990) Asymptotic chi-square tests for a large class of factor

analysis models Annals of Statistics 18 1453ndash1463Barndorff-Nielsen O E amp Cox D R (1984) Bartlett adjustments to the likelihood ratio statistic

and the distribution of the maximum likelihood estimator Journal of the Royal StatisticalSociety B 46 483ndash495

Bentler P M (1995) EQS Structural equations program manual Encino CA MultivariateSoftware

Bentler P M amp Yuan K-H (1999) Structural equation modeling with small samples Teststatistics Multivariate Behavioral Research 34 181ndash197

Beran R (1986) Simulated power functions Annals of Statistics 14 151ndash173Beran R (1988) Prepivoting test statistics Abootstrap view of asymptotic renements Journal

of the American Statistical Association 83 687ndash697Beran R amp Srivastava M S (1985) Bootstrap tests and condence regions for functions of a

covariance matrix Annals of Statistics 13 95ndash115Bollen K A amp Stine R (1993) Bootstrapping goodness of t measures in structural equation

models In K A BOllen and J S Long (Eds) Testing structural equation models(pp 111ndash135) Newbury Park CA Sage

Browne M W (1984) Asymptotic distribution-free methods for the analysis of covariancestructures British Journal of Mathematical and Statistical Psychology 37 62ndash83

Browne M W amp Shapiro A (1988) Robustness of normal theory methods in the analysis oflinear latent variate models British Journal of Mathematical and Statistical Psychology 41193ndash208

108 Ke-Hai Yuan and Kentaro Hayashi

Cohen J (1988) Statistical power analysis for the behavioral sciences (2nd ed) Hillsdale NJErlbaum

Davison A C amp Hinkley D V (1997) Bootstrap methods and their application CambridgeCambridge University Press

Efron B ampTibshirani R J (1993) An introduction to the bootstrap New York Chapman ampHallFouladi R T (1998) Covariance structure analysis techniques under conditions of multi-

variate normality and nonnormality Modied and bootstrap based test statistics Paperpresented at the annual meeting of the American Educational Research Association San DiegoCA

Fouladi R T (2000) Performance of modied test statistics in covariance and correlationstructure analysis under conditions of multivariate nonnormality Structural EquationModeling 7 356ndash410

Fung W K ampKwan C W (1995) Sensitivityanalysis in factor analysis Difference between usingcovariance and correlation matrices Psychometrika 60 607ndash614

Hall P (1992) The bootstrap and Edgeworth expansion New York Springer-VerlagHolzinger K J amp Swineford F (1939) A study in factor analysis The stability of a bi-factor

solution University of Chicago Press Supplementary Educational Monographs No 48Hu L T Bentler P M amp Kano Y (1992) Can test statistics in covariance structure analysis be

trusted Psychological Bulletin 112 351ndash362Huber P J (1981) Robust statistics New York WileyIchikawa M amp Konishi S (1995) Application of the bootstrap methods in factor analysis

Psychometrika 60 77ndash93Lawley D N amp Maxwell A E (1971) Factor analysis as a statistical method (2nd ed) New

York American ElsevierLee S Y ampWang S J (1996) Sensitivityanalysis of structural equation models Psychometrika

61 93ndash108MacCallum R C Browne M W ampSugawara H M (1996) Power analysis and determination of

sample size for covariance structure modeling Psychological Methods 1 130ndash149Mardia K V (1970) Measure of multivariate skewness and kurtosis with applications

Biometrika 57 519ndash530Mardia K V Kent J T amp Bibby J M (1979) Multivariate analysis New York Academic PressMaronna R A (1976) Robust M-estimators of multivariate location and scatter Annals of

Statistics 4 51ndash67Mooijaart A ampBentler P M (1991) Robustness of normal theory statistics in structural equation

models Statistica Neerlandica 45 159ndash171Neumann C S (1994) Structural equation modeling of symptoms of alcoholism and

psychopathology Doctoral dissertation University of Kansas LawrenceSatorra A amp Bentler P M (1988) Scaling corrections for chi-square statistics in covariance

structure analysis In 1988 Proceedings of Business and Economics Sections (pp 308ndash313)Alexandria VA American Statistical Association

Satorra A amp Bentler P M (1990) Model conditions for asymptotic robustness in the analysis oflinear relations Computational Statistics amp Data Analysis 10 235ndash249

Satorra A ampSaris W E (1985) Power of the likelihood ratio test in covariance structure analysisPsychometrika 50 83ndash90

Satorra A Saris W E amp de Pijper W M (1991) Acomparison of several approximations to thepower function of the likelihood ratio test in covariance structure analysis StatisticaNeerlandica 45 173ndash185

Shapiro A (1987) Robustness properties of the MDF analysis of moment structures SouthAfrican Statistical Journal 21 39ndash62

Steiger J H amp Lind J M (1980) Statistically based tests for the number of common factorsPaper presented at the annual meeting of the Psychometric Society Iowa City

Steiger J H Shapiro A amp Browne M W (1985) On the multivariate asymptotic distribution ofsequential chi-square statistics Psychometrika 50 253ndash264

Bootstrap approach to inference and power analysis 109

Tanaka Y Watadani S amp Moon S H (1991) Inuence in covariance structure analysis with anapplication to conrmatory factor analysis Communications in Statistics Theory andMethods 20 3805ndash3821

Tyler D E (1983) Robustness and efciency properties of scatter matrices Biometrika 70411ndash420

Wakaki H Eguchi S amp Fujikoshi Y (1990) A class of tests for a general covariance structureJournal of Multivariate Analysis 32 313ndash325

Yuan K-H amp Bentler P M (1997) Mean and covariance structure analysis Theoretical andpractical improvements Journal of the American Statistical Association 92 767ndash774

Yuan K-H amp Bentler P M (1998) Structural equation modeling with robust covariancesIn A E Raftery (ed) Sociological methodology 1998 (pp 363ndash396) Boston BlackwellPublishers

Yuan K-H amp Bentler P M (1999) On normal theory and associated test statistics in covariancestructure analysis under two classes of nonnormal distributions Statistica Sinica 9 831ndash853

Yuan K-H Chan W amp Bentler P M (2000) Robust transformation with applications tostructural equation modelling British Journal of Mathematical and Statistical Psychology53 31ndash50

Yuan K-H Bentler P M amp Chan W (2001) Structural equation modeling with heavy taileddistributions Manuscript submitted for publication

Yuan K-H amp Hayashi K (2001) On using an empirical Bayes covariance matrix in bootstrapapproach to covariance structure analysis Manuscript submitted for publication

Yung Y F amp Bentler P M (1994) Bootstrap-corrected ADF test statistics in covariance structureanalysis British Journal of Mathematical and Statistical Psychology 47 63ndash84

Yung Y F amp Bentler P M (1996) Bootstrapping techniques in analysis of mean and covariancestructures In G A Marcoulides and R E Schumacker (Eds) Advanced structural equationmodeling Techniques and issues (pp 195ndash226) Hillsdale NJ Lawrence Erlbaum

Zhang J amp Boos D D (1992) Bootstrap critical values for testing homogeneity of covariancematrices Journal of the American Statistical Association 87 425ndash429

Received 9 November 2000 revised version received 22 March 2002

110 Ke-Hai Yuan and Kentaro Hayashi

Page 16: Bootstrap approach to inference and power analysis based on three test statistics for covariance structure models

set through resampling properties of the three statistics can be visually examined bymeans of QQplots When a data set possesses heavy tails TML will inherit these tails byhaving a heavy right tail Actually all three statistics need the sampling distribution tohave nite fourth-order moments With possible violation of this assumption bypractical data we propose applying a bootstrap procedure to a transformed samplex (r)

i through downweighting The combination of bootstrapping and downweightingnot only offers a theoretical justication for applying the bootstrap to a data set withheavy tails but also provides a quite exible tool for exploring the properties of each ofthe three statistics Even if inference is based on referring a statistic to a chi-squaredistribution one will obtain a more accurate model evaluation by applying TML to atransformed sample x (r)

i Asymptotics justify the three statistics from different perspectives and TB is

asymptotically distribution-free Previous Monte Carlo studies mainly support TSBHowever with proper downweighting TML is generally the one that is best describedby a chi-square distribution However the conclusion that TML is the best statistic forbootstrap inference has to be preliminary Future studies may nd TB or TSB moreappropriate for other data sets We recommend exploring the different procedures for agiven data set as illustrated in Section 5

AcknowledgementsThe authors would like to thank Professor Peter M Bentler and two anonymous referees whoseconstructive comments lead to an improved version of this paper Correspondence concerningthis article should be addressed to Ke-Hai Yuan (kyuanndedu)

ReferencesAmemiya Y amp Anderson T W (1990) Asymptotic chi-square tests for a large class of factor

analysis models Annals of Statistics 18 1453ndash1463Barndorff-Nielsen O E amp Cox D R (1984) Bartlett adjustments to the likelihood ratio statistic

and the distribution of the maximum likelihood estimator Journal of the Royal StatisticalSociety B 46 483ndash495

Bentler P M (1995) EQS Structural equations program manual Encino CA MultivariateSoftware

Bentler P M amp Yuan K-H (1999) Structural equation modeling with small samples Teststatistics Multivariate Behavioral Research 34 181ndash197

Beran R (1986) Simulated power functions Annals of Statistics 14 151ndash173Beran R (1988) Prepivoting test statistics Abootstrap view of asymptotic renements Journal

of the American Statistical Association 83 687ndash697Beran R amp Srivastava M S (1985) Bootstrap tests and condence regions for functions of a

covariance matrix Annals of Statistics 13 95ndash115Bollen K A amp Stine R (1993) Bootstrapping goodness of t measures in structural equation

models In K A BOllen and J S Long (Eds) Testing structural equation models(pp 111ndash135) Newbury Park CA Sage

Browne M W (1984) Asymptotic distribution-free methods for the analysis of covariancestructures British Journal of Mathematical and Statistical Psychology 37 62ndash83

Browne M W amp Shapiro A (1988) Robustness of normal theory methods in the analysis oflinear latent variate models British Journal of Mathematical and Statistical Psychology 41193ndash208

108 Ke-Hai Yuan and Kentaro Hayashi

Cohen J (1988) Statistical power analysis for the behavioral sciences (2nd ed) Hillsdale NJErlbaum

Davison A C amp Hinkley D V (1997) Bootstrap methods and their application CambridgeCambridge University Press

Efron B ampTibshirani R J (1993) An introduction to the bootstrap New York Chapman ampHallFouladi R T (1998) Covariance structure analysis techniques under conditions of multi-

variate normality and nonnormality Modied and bootstrap based test statistics Paperpresented at the annual meeting of the American Educational Research Association San DiegoCA

Fouladi R T (2000) Performance of modied test statistics in covariance and correlationstructure analysis under conditions of multivariate nonnormality Structural EquationModeling 7 356ndash410

Fung W K ampKwan C W (1995) Sensitivityanalysis in factor analysis Difference between usingcovariance and correlation matrices Psychometrika 60 607ndash614

Hall P (1992) The bootstrap and Edgeworth expansion New York Springer-VerlagHolzinger K J amp Swineford F (1939) A study in factor analysis The stability of a bi-factor

solution University of Chicago Press Supplementary Educational Monographs No 48Hu L T Bentler P M amp Kano Y (1992) Can test statistics in covariance structure analysis be

trusted Psychological Bulletin 112 351ndash362Huber P J (1981) Robust statistics New York WileyIchikawa M amp Konishi S (1995) Application of the bootstrap methods in factor analysis

Psychometrika 60 77ndash93Lawley D N amp Maxwell A E (1971) Factor analysis as a statistical method (2nd ed) New

York American ElsevierLee S Y ampWang S J (1996) Sensitivityanalysis of structural equation models Psychometrika

61 93ndash108MacCallum R C Browne M W ampSugawara H M (1996) Power analysis and determination of

sample size for covariance structure modeling Psychological Methods 1 130ndash149Mardia K V (1970) Measure of multivariate skewness and kurtosis with applications

Biometrika 57 519ndash530Mardia K V Kent J T amp Bibby J M (1979) Multivariate analysis New York Academic PressMaronna R A (1976) Robust M-estimators of multivariate location and scatter Annals of

Statistics 4 51ndash67Mooijaart A ampBentler P M (1991) Robustness of normal theory statistics in structural equation

models Statistica Neerlandica 45 159ndash171Neumann C S (1994) Structural equation modeling of symptoms of alcoholism and

psychopathology Doctoral dissertation University of Kansas LawrenceSatorra A amp Bentler P M (1988) Scaling corrections for chi-square statistics in covariance

structure analysis In 1988 Proceedings of Business and Economics Sections (pp 308ndash313)Alexandria VA American Statistical Association

Satorra A amp Bentler P M (1990) Model conditions for asymptotic robustness in the analysis oflinear relations Computational Statistics amp Data Analysis 10 235ndash249

Satorra A ampSaris W E (1985) Power of the likelihood ratio test in covariance structure analysisPsychometrika 50 83ndash90

Satorra A Saris W E amp de Pijper W M (1991) Acomparison of several approximations to thepower function of the likelihood ratio test in covariance structure analysis StatisticaNeerlandica 45 173ndash185

Shapiro A (1987) Robustness properties of the MDF analysis of moment structures SouthAfrican Statistical Journal 21 39ndash62

Steiger J H amp Lind J M (1980) Statistically based tests for the number of common factorsPaper presented at the annual meeting of the Psychometric Society Iowa City

Steiger J H Shapiro A amp Browne M W (1985) On the multivariate asymptotic distribution ofsequential chi-square statistics Psychometrika 50 253ndash264

Bootstrap approach to inference and power analysis 109

Tanaka Y Watadani S amp Moon S H (1991) Inuence in covariance structure analysis with anapplication to conrmatory factor analysis Communications in Statistics Theory andMethods 20 3805ndash3821

Tyler D E (1983) Robustness and efciency properties of scatter matrices Biometrika 70411ndash420

Wakaki H Eguchi S amp Fujikoshi Y (1990) A class of tests for a general covariance structureJournal of Multivariate Analysis 32 313ndash325

Yuan K-H amp Bentler P M (1997) Mean and covariance structure analysis Theoretical andpractical improvements Journal of the American Statistical Association 92 767ndash774

Yuan K-H amp Bentler P M (1998) Structural equation modeling with robust covariancesIn A E Raftery (ed) Sociological methodology 1998 (pp 363ndash396) Boston BlackwellPublishers

Yuan K-H amp Bentler P M (1999) On normal theory and associated test statistics in covariancestructure analysis under two classes of nonnormal distributions Statistica Sinica 9 831ndash853

Yuan K-H Chan W amp Bentler P M (2000) Robust transformation with applications tostructural equation modelling British Journal of Mathematical and Statistical Psychology53 31ndash50

Yuan K-H Bentler P M amp Chan W (2001) Structural equation modeling with heavy taileddistributions Manuscript submitted for publication

Yuan K-H amp Hayashi K (2001) On using an empirical Bayes covariance matrix in bootstrapapproach to covariance structure analysis Manuscript submitted for publication

Yung Y F amp Bentler P M (1994) Bootstrap-corrected ADF test statistics in covariance structureanalysis British Journal of Mathematical and Statistical Psychology 47 63ndash84

Yung Y F amp Bentler P M (1996) Bootstrapping techniques in analysis of mean and covariancestructures In G A Marcoulides and R E Schumacker (Eds) Advanced structural equationmodeling Techniques and issues (pp 195ndash226) Hillsdale NJ Lawrence Erlbaum

Zhang J amp Boos D D (1992) Bootstrap critical values for testing homogeneity of covariancematrices Journal of the American Statistical Association 87 425ndash429

Received 9 November 2000 revised version received 22 March 2002

110 Ke-Hai Yuan and Kentaro Hayashi

Page 17: Bootstrap approach to inference and power analysis based on three test statistics for covariance structure models

Cohen J (1988) Statistical power analysis for the behavioral sciences (2nd ed) Hillsdale NJErlbaum

Davison A C amp Hinkley D V (1997) Bootstrap methods and their application CambridgeCambridge University Press

Efron B ampTibshirani R J (1993) An introduction to the bootstrap New York Chapman ampHallFouladi R T (1998) Covariance structure analysis techniques under conditions of multi-

variate normality and nonnormality Modied and bootstrap based test statistics Paperpresented at the annual meeting of the American Educational Research Association San DiegoCA

Fouladi R T (2000) Performance of modied test statistics in covariance and correlationstructure analysis under conditions of multivariate nonnormality Structural EquationModeling 7 356ndash410

Fung W K ampKwan C W (1995) Sensitivityanalysis in factor analysis Difference between usingcovariance and correlation matrices Psychometrika 60 607ndash614

Hall P (1992) The bootstrap and Edgeworth expansion New York Springer-VerlagHolzinger K J amp Swineford F (1939) A study in factor analysis The stability of a bi-factor

solution University of Chicago Press Supplementary Educational Monographs No 48Hu L T Bentler P M amp Kano Y (1992) Can test statistics in covariance structure analysis be

trusted Psychological Bulletin 112 351ndash362Huber P J (1981) Robust statistics New York WileyIchikawa M amp Konishi S (1995) Application of the bootstrap methods in factor analysis

Psychometrika 60 77ndash93Lawley D N amp Maxwell A E (1971) Factor analysis as a statistical method (2nd ed) New

York American ElsevierLee S Y ampWang S J (1996) Sensitivityanalysis of structural equation models Psychometrika

61 93ndash108MacCallum R C Browne M W ampSugawara H M (1996) Power analysis and determination of

sample size for covariance structure modeling Psychological Methods 1 130ndash149Mardia K V (1970) Measure of multivariate skewness and kurtosis with applications

Biometrika 57 519ndash530Mardia K V Kent J T amp Bibby J M (1979) Multivariate analysis New York Academic PressMaronna R A (1976) Robust M-estimators of multivariate location and scatter Annals of

Statistics 4 51ndash67Mooijaart A ampBentler P M (1991) Robustness of normal theory statistics in structural equation

models Statistica Neerlandica 45 159ndash171Neumann C S (1994) Structural equation modeling of symptoms of alcoholism and

psychopathology Doctoral dissertation University of Kansas LawrenceSatorra A amp Bentler P M (1988) Scaling corrections for chi-square statistics in covariance

structure analysis In 1988 Proceedings of Business and Economics Sections (pp 308ndash313)Alexandria VA American Statistical Association

Satorra A amp Bentler P M (1990) Model conditions for asymptotic robustness in the analysis oflinear relations Computational Statistics amp Data Analysis 10 235ndash249

Satorra A ampSaris W E (1985) Power of the likelihood ratio test in covariance structure analysisPsychometrika 50 83ndash90

Satorra A Saris W E amp de Pijper W M (1991) Acomparison of several approximations to thepower function of the likelihood ratio test in covariance structure analysis StatisticaNeerlandica 45 173ndash185

Shapiro A (1987) Robustness properties of the MDF analysis of moment structures SouthAfrican Statistical Journal 21 39ndash62

Steiger J H amp Lind J M (1980) Statistically based tests for the number of common factorsPaper presented at the annual meeting of the Psychometric Society Iowa City

Steiger J H Shapiro A amp Browne M W (1985) On the multivariate asymptotic distribution ofsequential chi-square statistics Psychometrika 50 253ndash264

Bootstrap approach to inference and power analysis 109

Tanaka Y Watadani S amp Moon S H (1991) Inuence in covariance structure analysis with anapplication to conrmatory factor analysis Communications in Statistics Theory andMethods 20 3805ndash3821

Tyler D E (1983) Robustness and efciency properties of scatter matrices Biometrika 70411ndash420

Wakaki H Eguchi S amp Fujikoshi Y (1990) A class of tests for a general covariance structureJournal of Multivariate Analysis 32 313ndash325

Yuan K-H amp Bentler P M (1997) Mean and covariance structure analysis Theoretical andpractical improvements Journal of the American Statistical Association 92 767ndash774

Yuan K-H amp Bentler P M (1998) Structural equation modeling with robust covariancesIn A E Raftery (ed) Sociological methodology 1998 (pp 363ndash396) Boston BlackwellPublishers

Yuan K-H amp Bentler P M (1999) On normal theory and associated test statistics in covariancestructure analysis under two classes of nonnormal distributions Statistica Sinica 9 831ndash853

Yuan K-H Chan W amp Bentler P M (2000) Robust transformation with applications tostructural equation modelling British Journal of Mathematical and Statistical Psychology53 31ndash50

Yuan K-H Bentler P M amp Chan W (2001) Structural equation modeling with heavy taileddistributions Manuscript submitted for publication

Yuan K-H amp Hayashi K (2001) On using an empirical Bayes covariance matrix in bootstrapapproach to covariance structure analysis Manuscript submitted for publication

Yung Y F amp Bentler P M (1994) Bootstrap-corrected ADF test statistics in covariance structureanalysis British Journal of Mathematical and Statistical Psychology 47 63ndash84

Yung Y F amp Bentler P M (1996) Bootstrapping techniques in analysis of mean and covariancestructures In G A Marcoulides and R E Schumacker (Eds) Advanced structural equationmodeling Techniques and issues (pp 195ndash226) Hillsdale NJ Lawrence Erlbaum

Zhang J amp Boos D D (1992) Bootstrap critical values for testing homogeneity of covariancematrices Journal of the American Statistical Association 87 425ndash429

Received 9 November 2000 revised version received 22 March 2002

110 Ke-Hai Yuan and Kentaro Hayashi

Page 18: Bootstrap approach to inference and power analysis based on three test statistics for covariance structure models

Tanaka Y Watadani S amp Moon S H (1991) Inuence in covariance structure analysis with anapplication to conrmatory factor analysis Communications in Statistics Theory andMethods 20 3805ndash3821

Tyler D E (1983) Robustness and efciency properties of scatter matrices Biometrika 70411ndash420

Wakaki H Eguchi S amp Fujikoshi Y (1990) A class of tests for a general covariance structureJournal of Multivariate Analysis 32 313ndash325

Yuan K-H amp Bentler P M (1997) Mean and covariance structure analysis Theoretical andpractical improvements Journal of the American Statistical Association 92 767ndash774

Yuan K-H amp Bentler P M (1998) Structural equation modeling with robust covariancesIn A E Raftery (ed) Sociological methodology 1998 (pp 363ndash396) Boston BlackwellPublishers

Yuan K-H amp Bentler P M (1999) On normal theory and associated test statistics in covariancestructure analysis under two classes of nonnormal distributions Statistica Sinica 9 831ndash853

Yuan K-H Chan W amp Bentler P M (2000) Robust transformation with applications tostructural equation modelling British Journal of Mathematical and Statistical Psychology53 31ndash50

Yuan K-H Bentler P M amp Chan W (2001) Structural equation modeling with heavy taileddistributions Manuscript submitted for publication

Yuan K-H amp Hayashi K (2001) On using an empirical Bayes covariance matrix in bootstrapapproach to covariance structure analysis Manuscript submitted for publication

Yung Y F amp Bentler P M (1994) Bootstrap-corrected ADF test statistics in covariance structureanalysis British Journal of Mathematical and Statistical Psychology 47 63ndash84

Yung Y F amp Bentler P M (1996) Bootstrapping techniques in analysis of mean and covariancestructures In G A Marcoulides and R E Schumacker (Eds) Advanced structural equationmodeling Techniques and issues (pp 195ndash226) Hillsdale NJ Lawrence Erlbaum

Zhang J amp Boos D D (1992) Bootstrap critical values for testing homogeneity of covariancematrices Journal of the American Statistical Association 87 425ndash429

Received 9 November 2000 revised version received 22 March 2002

110 Ke-Hai Yuan and Kentaro Hayashi