Top Banner
ELSEVIER Computational Statistics & Data Analysis 26 (1997) 177-198 COMPUTATIONAL STATISTICS & DATA ANALYSIS in Improving parameter tests covariance structure analysis Ke-Hai Yuan, Peter M. Bentler* Department of Psychology, University of California, Los Angeles, CA 90024-1563, USA Received 1 August 1996; received in revised form 1 March 1997 Abstract In many areas, covariance structure analysis plays an important role in understanding how the rela- tionship among observed variables might be generated by hypothesized latent variables. Once a model is established as relevant to a given data set, it is important to evaluate the significance of specific parameters, such as coefficients of regressions among latent variables, within the model. The popular z-test of a parameter is the estimator of the parameter divided by its standard error estimator. A valid z-statistic must be based on a high-quality standard error estimator. We focus on the quality of the standard error estimator from both MLE and ADF methods, which are the two most frequently used methods in covariance structure practice. For these two estimation methods, empirical evidence shows that classical formulae give "too optimistic" standard error estimators, with the result that the z-tests regularly give false conclusions. We review one and introduce another simple corrected standard error estimator. These substantially improve on the classical ones, depending on distribution and sample size. Two implications of this study are that significant parameters as printed in most statistical software may not be really significant, and that corrected standard errors should be direct output for the two most widely used methods. A comparison of the accuracy of the estimators based on these two methods is also made. @ 1997 Elsevier Science B.V. Keywords: Z-scores; Corrected standard error; Bias; Mean square error; Maximum likelihood; Asymp- totically distribution free I. Introduction In the social and behavioral sciences, variables such as attitudes, personality traits, mental health status, political liberalism-conservatism, etc., are often of interest to * Corresponding author. 0167-9473/97/$17.00 @ 1997 Elsevier Science B.V. All rights reserved PH S0 167-9473(97)00025-X
22

Improving parameter tests in covariance structure analysis

Jan 29, 2023

Download

Documents

James Lewis
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Improving parameter tests in covariance structure analysis

ELSEVIER Computational Statistics & Data Analysis 26 (1997) 177-198

COMPUTATIONAL STATISTICS

& DATA ANALYSIS

in Improving parameter tests covariance structure analysis

K e - H a i Y u a n , P e t e r M. B e n t l e r *

Department of Psychology, University of California, Los Angeles, CA 90024-1563, USA

Received 1 August 1996; received in revised form 1 March 1997

A b s t r a c t

In many areas, covariance structure analysis plays an important role in understanding how the rela- tionship among observed variables might be generated by hypothesized latent variables. Once a model is established as relevant to a given data set, it is important to evaluate the significance of specific parameters, such as coefficients of regressions among latent variables, within the model. The popular z-test of a parameter is the estimator of the parameter divided by its standard error estimator. A valid z-statistic must be based on a high-quality standard error estimator. We focus on the quality of the standard error estimator from both MLE and ADF methods, which are the two most frequently used methods in covariance structure practice. For these two estimation methods, empirical evidence shows that classical formulae give "too optimistic" standard error estimators, with the result that the z-tests regularly give false conclusions. We review one and introduce another simple corrected standard error estimator. These substantially improve on the classical ones, depending on distribution and sample size. Two implications of this study are that significant parameters as printed in most statistical software may not be really significant, and that corrected standard errors should be direct output for the two most widely used methods. A comparison of the accuracy of the estimators based on these two methods is also made. @ 1997 Elsevier Science B.V.

Keywords: Z-scores; Corrected standard error; Bias; Mean square error; Maximum likelihood; Asymp- totically distribution free

I. Introduction

In the social and behavioral sciences, variables such as attitudes, personality traits, mental health status, political liberalism-conservatism, etc., are often of interest to

* Corresponding author.

0167-9473/97/$17.00 @ 1997 Elsevier Science B.V. All rights reserved PH S0 1 6 7 - 9 4 7 3 ( 9 7 ) 0 0 0 2 5 - X

Page 2: Improving parameter tests in covariance structure analysis

178 K.-H. Yuan, P.M. Bentler/ Computational Statistics & Data Analysis 26 (1997) 17~198

researchers. Attributes such as intellectual abilities of students or teaching styles of instructors are important concepts in education. The relationship between demand and supply is very important to a market economy. Abstract concepts such as these typically cannot be observed or measured directly, or can only be measured with errors; hence, an appropriate data analysis must take into account not only the sta- tistical relations involved among such hypothesized latent constructs but also the er- rors of measurement in the variables. Structural equation models such as confirmatory factor analysis and general linear relations have been developed to provide practical and accurate ways of dealing with multivariate analysis with latent variables (e.g., Bentler and Wu, 1995; J6reskog and S6rbom, 1993). Recent reviews of some of the vast literature in this field have been given by Austin and Calder6n (1996), Bentler and Dudgeon (1996), Browne and Arminger ( 1995 ), Hoyle (1995), Marcoulides and Schumacker (1996), and Tremblay and Gardner (1996).

Data analysis with structural models requires parameter estimation, ideally with an efficient estimator such as the maximum likelihood or minimum 7~ 2 estimator. Asso- ciated with an estimator are two key statistical tests. The first is a goodness of fit test to evaluate whether a proposed model is relevant to a particular set of data. One of several asymptotic )/2 tests is usually used for this purpose; we review two such tests below. If the ~2 is small compared to its degrees of freedom, the model provides a plausible representation of the data; on the other hand, a large 7~ 2 implies an inadequate model. While serious questions have been raised about the adequacy of some of these )~2 tests (e.g., Bentler and Dudgeon, 1996; Chou et al., 1991; Curran et al., 1996; Hu et al., 1992), for purposes of this paper we assume that an adequate model test procedure is available. The second key statistical test is invoked primarily when a model hypothesis has been accepted, i.e., the model can be considered appropriate for the data at hand. In that case, it is important to evaluate particular parameters in the model to see whether they are statistically necessary. A more parsimonious model will result when pruning nonsignificant estimates. Typi- cally, an asymptotic z-test, based on the ratio of an estimate to its asymptotic standard error estimate, is used to evaluate the necessity of a given parameter. In this paper we study the quality of the standard error estimates in covariance structure analysis and use the typical factor analysis model as an example. We find that the routinely used standard error estimates are much more biased than has been suspected, and in particular, tend to be substantial underestimates of actual sampling variability so that parameters in these models that should truly be considered as nonexistent or zero, are taken as statistically significant instead. As a result, models may contain many superfluous parameters that not only make the models much more complex than necessary, but also the models may not replicate when evaluated on new samples. We study this problem both theoretically and empirically, and make some specific recommendations for use in data analysis.

The confirmatory factor analytic model is the most widely used example of this class of models. This model expresses observed variables as functions of hypothe- sized latent factors and errors as

X = # + Af +e, (1.1)

Page 3: Improving parameter tests in covariance structure analysis

K.-H. Yuan, P.M. Bentlerl Computational Statistics & Data Analysis 26 (1997) 177-198 179

where X represents the vector of observed indicator variables, f is a vector of latent variables, e is a vector of unique factors or random errors, and A is a matrix of regression coefficients called factor loadings. Assuming the means of f and e are zero, then /~ = E(X). The usual estimator for /~ is X, the sample mean of the ob- served data. However, the parameter matrix A cannot be estimated from the first moment of X. Let 2; = var(X), • = va r ( f ) , T = var(e). Assuming that the factors f and errors e are uncorrelated, the covariance structure of (1.1) is

2;(O)=ACbA' + T, (1.2)

where 0 is the vector of unknown elements in matrices A, ~b, and 5 u. Eq. (1.2) describes the covariance structure of X through the factor analysis model (1.1). It can also be written as X = (A, I ) ( f ' , e')'. More generally, we have X = A~ where the matrix A----A(7) is a function of a basic vector of parameters, and the underlying generating vector ~ contains measured, latent, or residual random or fixed variables (e.g., Anderson, 1989; Bentler, 1983; Satorra and Neudecker, 1994), with a corre- sponding covariance structure 2;(0). These types of models can be estimated and tested based on the sample covariance matrix. Let XI,.. . ,XN be a sample of size N = n + 1 from population X, and S be the usual sample covariance. If X is of di- mension p, then the number of nonduplicated elements in S is p* = p(p + 1 )/2. Let q be the number of unknown elements in 0, then the model will not be of interest unless q < p*. For example, in model (1.2), we can restrict 7 j to be a diagonal matrix, 4~ to be a correlation matrix, and each row of A to have only one or a few nonzero elements. Estimation of the parameters is performed under the assumption of multivariate normality by maximum likelihood (ML), or without any distribu- tional assumptions, using minimum chi-square, which is known in this field as asymptotically distribution free (ADF).

We can illustrate the parameter testing problem using data from Harlow et al. (1995), who studied a variety of scales and measures regarding psychosocial func- tioning, sexual experience, and substance use in an attempt to understand the antecedents of HIV-risky sexual behaviors. Thirteen of their variables, a small sub- set of their study variables, were made available to us. The authors had strong reasons to believe that four correlated factors based on 32 parameters could explain these 91 covariances, i.e., a confirmatory factor model with 59 degrees of freedom was hypothesized. The authors posited that their 13 variables X could be generated by (1.1) with A(13 x4) , f ( 4 x 1), and e(13 x 1). For our purposes, we added an extra free factor loading 2ij to be estimated in A, permitting one variable to be influenced by more than one factor. Based on a sample of n = 2 1 3 women, this model was estimated using both ML and ADF since the data do not have a multivariate normal distribution. The resulting model fit the data acceptably. We then tested our extra factor loading and obtained 2i j=0.410 with S.E.=0.140 for a z-test of 2.92 (for

ML) and 2ij = 0.296 with S.E~. = 0.115 for a z-test of 2.58 (for ADF). Evidently, this extra factor loading is necessary to the model. However, as we shall show below, at this sample size ML and ADF produce standard error estimates that are markedly below the actual standard errors, being too small in size by a factor of 1.4-2.6.

Page 4: Improving parameter tests in covariance structure analysis

180 K.-H. Yuan, P.M. Bentler/ Computational Statistics & Data Analysis" 26 (1997) 17~198

Using a more accurate standard error estimate would show that our conclusion using existing methods would be clearly in error.

2. Two major methods

When the sample )(1 . . . . ,XN is obtained from a normal population X ~N(p ,X ) , then nS follows a Wishart distribution. The ML estimator (MLE) On can be obtained by minimizing

F~ (0) = tr[SZ-'(O)] - log IS2:-l(0)l - p, (2.1)

and T1 =nFl(0n) is used to judge the adequacy of the structure X(O). Under some standard regularity conditions,

x/~(On - 0o) ~ N(O, (2,),

where (2~ = 2 { a ' ( X - l ® X - l ) ~ } -1 is the inverse of the information matrix with ~ =

8 vec(X(O))/80'. Thus, an estimate of the standard error of 0n can be calculated based on F2j/n with estimator

8 , = 2[~ ' (0,){X- ' (0n)®X-'(0n)}6(0,)]-~. (2.2)

Since T1 is just the likelihood ratio statistic associated with the Wishart distribu-

tion, T~ ~ )~p.2 _ q' and a critical value from )~p2. _ q is often used to judge the signi-

ficance o f T1. Parameter tests can be based on z-statistics calculated as z = 0i/S~E., where S.E. is obtained as the square root from the appropriate diagonal element of (2~/n. When data are normal and sample sizes are large enough, these z-statistics are approximately normally distributed (e.g., Gerbing and Anderson, 1985), and so the univariate normal distribution can be used to test the significance of the given parameters. Unfortunately, this optimality property can break down in small samples, and generally will break down in samples which violate the assumed multivariate nor- mality condition, as reviewed by Bentler and Dudgeon (1996). For example, Curran (1994) performed an extensive Monte Carlo study of three 9-variable models eval- uated under a normal and two nonnormal conditions, at sample sizes 100, 200, and 1000. He concluded that "the ML standard errors were significantly negatively biased as a function of increasing nonnormality. Under severely non-normal distributions, the ML standard errors for the factor loadings were underestimated by more than 50% across all three sample sizes. This underestimation was even more pronounced for the standard errors of the uniquenesses. These results provide further evidence that great care must be taken when interpreting the significance of z-ratios for param- eter estimates in models based on non-normally distributed data regardless of sample size" (p. 197). We study these problems in further detail below.

Since real data are seldom normal and are usually characterized by skewness and positive kurtosis, it seems appropriate to pay more attention to methods which are valid under nonnormal data. Anderson and Amemiya (1988) and Amemiya and Anderson (1990) gave conditions under which some normal theory standard errors

Page 5: Improving parameter tests in covariance structure analysis

K.-H. Yuan, P.M. Bentler/ Computational Statistics & Data Analysis 26 (1997) 177-198 181

and the test T~ are asymptotically valid even when X is not normally distributed; see also Satorra and Neudecker (1994). For the factor analysis model (1.1), one of their main conditions is that f and e are independent and the elements of e are also independent. Their results makes the normal theory method valid for a wide class of distributions. However, there is no effective way to verify their assumptions in practice, and the structure (1.2), for example, only requires that f and e are uncor- related. Empirical results of Hu et al. (1992) indicate that inference based on T~ can be misleading if Anderson and Amemiya's independence condition is not met. We shall evaluate whether this may also occur for standard error estimates.

A theoretical breakthrough in the development of covariance structure analysis for data which may be arbitrarily distributed was made by Browne (1984) and Chamberlain (1982). Let vech(.) be an operator which transforms a symmetric matrix into a vector by picking the nonduplicated elements of the matrix, Yi = vech{(Xi -

)?)(X/ - )?)'}, and I 7 and Sy be the usual sample mean and sample covariance of Y/, then 17 = nvech (S ) /N . Modeling s = vech(S) by o-(0)= vech(X(0)) is asymptotically equivalent to modeling Yi by a(0). Let V~ =var(s) , then V~ ~ V =var[vech{(X, . - #0)(X~- #0)'}]. Since Sr is a consistent estimator of V, Browne (1984) proposed to estimate 00 by minimizing

F2(0) = (s - a (O)) 'S~ ' ( s - a(0)). (2.3)

Let 0n be the corresponding estimator, then

x/~(0, - 00) -~ N(0, (22),

where ~c~ 2 = ( o - t V - l o - ) - I for which

(22 = {d'(O, ) S ~ d ( O , )}- ' (2.4)

is a consistent estimator. The corresponding statistic T2 = n&(0,) asymptotically fol- lows a chi-square distribution with degrees of freedom p * - q. This method is asymp- totically valid for any distribution with finite fourth-order moments (Browne assumed eighth moments). When the model size is large, the ADF method often does not converge to a solution, and T2 rejects a correct model too often for the converged solutions (see Hu et al., 1992 for p = 15, and Curran et al., 1996 for p = 9 ) . The minor modification T* = 7"2/(1 + T2/n) substantially improves the performance of T2 (Yuan and Bentler, 1997). With regard to standard errors, the standard error esti- mator based on (2.4) can match the empirical variability across samples in a Monte Carlo study for p = 6 (Chou et al., 1991), but it can also break down except at the largest sample sizes (Henly, 1993). Curran (1994), in the study referred to previ- ously, found that ADF standard errors were significantly negatively biased in normal samples with Ns of 100 and 200, but also "cannot be trusted even under condi- tions of multivariate non-normality at samples of N = 1000 or less" (p. 198). It is our expectation that for p = 15 there will be a larger discrepancy between empirical standard error of 0, and that based on (2.4), leading to incorrect parameter tests at all but the largest sample sizes. The dimension of a data set can really decide the quality of ADF methods for a practical sample size, as will be demonstrated below.

Page 6: Improving parameter tests in covariance structure analysis

1 8 2 K.-H. Yuan, P.M. Bentler/ Computational Statistics & Data Analysis 26 (1997) 17~198

Clearly, there is a need to correct the standard error estimates for both MLE and ADF. We consider two corrections. One is well known but almost never used. The other is not known in this context.

When using a method based on normal theory such as ML, the standard error given by t)~ is asymptotically correct when data actually are normal. We will call this the standard error "predicted" under normality. However, when data are not normal, the resulting standard error based on ~)~ will not be correct in general. For example, when data are sampled from an elliptical distribution with a coefficient of kurtosis larger than 1, the true asymptotic standard error of 0, will be underestimated when based on 81. Let Dp be the duplication matrix as defined by Magnus and Neudecker (1988, p. 49), then it has been known for over a decade (e.g., Bentler, 1983; Browne, 1984; Bentler and Dijkstra, 1985) but almost completely ignored (Bentler and Dudgeon, 1996), that the MLE has the general covariance matrix

v (O. 00) - ~ N ( 0 ,

where ~21c is consistently estimated by

~ r ) l c ^ . t ^ • ^ ^ = {G (o.) wsy (2.5)

and W = 1 , - 1 5D'p{Z (O,)®Z-l(O,)}Dp. This covariance matrix is sometimes known as the sandwich-type matrix, and is computed as the "robust" covariance matrix in EQS (Bentler, 1995). Another version of (2.5) is given by Arminger and Schoenberg (1989) in the framework of pseudo-likelihood estimator. If the pseudo-likelihood is based on a normal density function, it can be shown that the objective func- tion (18) in Arminger and Schoenberg (1989) is equivalent to FI(0) in (2.1) and their covariance matrix (12) is equivalent to t21c after taking expectations. Arminger and Schoenberg state that their sandwich-type covariance is computationally sim- ple. Since their equation (25) includes several terms (involving second derivatives) whose expectations are zero, implementation of their Eq. (12) actually needs more computations than given for (2.5). Our empirical experience is that including terms whose expectations are zero in calculating standard errors has a nonsignificant dif- ference from those given by (2.5) even in small samples. Since, besides St, all the quantities in (2.5) are computed automatically when using the ML method, ~)~c can be easily incorporated into existing software. Hence, we will concentrate on study (2.5) in the rest of this paper.

Regardless of the distribution conditions, the standard error of O, based on Qlc is always asymptotically correct, so we will call it the "corrected" standard error. However, not much is known about its finite sample performance in this context. For p = 6 , Chou et al. (1991) gave an extensive study of different estimators of standard errors, and the sandwich-type matched the empirical variability very well. Similar results were obtained by Chou and Bentler (1995). For p = 9, Curran (1994) concluded that the robust standard errors were negatively biased at his smallest sam- ple size but were unbiased at his moderate and large sample sizes. It is not known whether this estimator will perform well with larger models, e.g., for p = 15. In line with Curran, we suspect that the standard errors based on the sandwich-type

Page 7: Improving parameter tests in covariance structure analysis

I(2-H. Yuan, P.M. Bentler/ Computational Statistics & Data Analysis 26 (1997) 177-198 183

covariance matrix will still underestimate the empirical variability in small samples and under various conditions of nonnormality.

Next, we propose a new correction to the estimator of the asymptotic covari- ance of the ADF estimator 0n. As noted above, when based on the ADF estimation process, the asymptotic covariance of v~(0n - 00) is given by 02. If we knew the population fourth-order moment matrix V, we could use V -j instead of S~ -1 in ~2. It is easy to understand that a more accurate estimator of V-~ would lead to a more accurate estimator of f22, and consequently, to more accurate standard errors for 0n. Let Z1,...,ZN be a sample from Nm(#, V) and Sz be the sample covariance matrix. Then it is known that the unbiased estimator of V -1 is given by ( n - m - 1)/nSz l, with n = N - 1. Obviously, V -1 is overestimated by Sz 1 itself. Even though the Y,. are not normal vectors in the ADF method, it is hard to imagine that there is an advantage to S{ -1 over [ ( n - p * - 1 ) / n ] S ~ 1 in estimating V -l in f22. The estimation of V -I by Sr -1 as suggested by Browne (1984) and Chamberlain (1982) was moti- vated by the consistency property. However, this property is maintained when using [ ( n - p * - 1 )/n]S~ 1 to estimate V 1. Motivated by this approach, and by the dis- crepancy between the empirical standard errors and those based on ~2 (e.g., Henly, 1993; Chou and Bentler, 1995), we propose to estimate the asymptotic covariance of v/-n(0n - 00 ) by

~)2c - n ~2 . ( 2 . 6 ) n - p * - I

Since the standard error based on ~2 is too optimistic for an ADF estimator 0n, standard errors based on ~2c should match the empirical ones more closely. In parallel to the standard error of 0n, we call the standard error based on 02 and ~e'~2c the ADF "predicted" standard error and the ADF "corrected" standard error, respectively. It should be noted that, unlike the relation between f)l and f)lc, both ~'2 2 and f22c are

asymptotically correct for estimating the covariance of v ~ ( 0 n - 00). Any difference will be seen on!y at finite sample sizes.

Beside the two major methods discussed above, several other methods exist that can be used for both normal and nonnormal data. When enough instrumental variables are available for a proper structural model, Bollen's (1996) two-stage least-squares method can be used for estimating the parameters and their standard errors in the structural model. When facing a data set that is significantly nonnormal, a bootstrap method is always an alternative to ML and ADF (e.g., Bollen and Stine, 1990; Boomsma, 1986; Chatterjee, 1984; Ichikawa and Konishi, 1995). Especially when sample size is small, the distributions of MLE or ADF estimator may not be well approximated by normal distributions as is assumed in asymptotic theory. In such a case, a well-chosen bootstrap method might still be able to give reliable confidence intervals for the parameter estimators, but this is not guaranteed as reviewed by Yung and Bentler (1996). Further, since each bootstrap sample may contain many repeated observations, convergence can be a practical problem facing the bootstrap method. As commented by Ichikawa and Konishi (1995), "This implies that bootstrap methods should be applied with care when the improper solutions are frequent for the bootstrap samples". The problems of nonnormality of the MLE and ADF estimators

Page 8: Improving parameter tests in covariance structure analysis

184 K.-H. Yuan, P.h/L Bentler/ Computational Statistics & Data Analysis 26 (1997) 17~198

and reliable implementations of bootstrap methods in this situation are important and challenging problems for which further research is needed.

3. Empirical standard errors versus asymptotic standard errors

In this section, we compare the empirical standard errors obtained from Monte Carlo sampling study to the corresponding asymptotic standard errors given above for MLE and ADF. We also study the sandwich estimator for MLE and our corrected standard errors for the ADF estimators.

The first model we used is a factor analysis model as in (1.1) with p = 15. The population covariance is generated by

( i0i) /10 04o, A = 2 , q~= 0.30 1.0 0 .501, (3.1)

0 \0 .40 0.50 1.0 /

where 2' = (70, 0.70, 0.75, 0.80, 0.80) and 0 is a vector of 5 zeros. The 7 j is a diagonal matrix which makes all the diagonal elements of 22 equal to 1. In the estimation process, we restrict the last factor loading corresponding to each factor at its true value in order to fix the scale of the factors. This restriction is only for model identification, and has no effect on the quality of the model fitting process nor on the resulting estimated models. All the other nonzero parameters are set as unknown free parameters. So q = 33, of which 12 are factor loading parameters.

Three conditions of distributions of variables are used. In the first condition, both f and e are normal, so that the observed X is a normal vector. In the second condition, we use condition 5 of Hu et al. (1992), in which f and e are independent normal vectors multiplied by a common random variable R = v~/v/~52, which is independent to both f and e. This makes f and e uncorrelated but not independent, and X is symmetrically distributed. Since ER 2= 1, and ER 4= 3, the observed X has the same covariance structure as that in condition 1, with Mardia's multivariate kurtosis t /= {(X - / t ) '22-1(X - # ) } 2 / p ( p + 1 ) = E R 4 = 3. In the third condition, f is normal and e is lognormal and multiplied by R = x/3/v~52. The resulting X has the same covariance structure as in conditions 1 and 2, but the variables are not symmetrically distributed anymore. For each condition, repeated samples with sizes N = 150, 200, 300, 500, and 1000 are drawn. Both normal theory ML and ADF methods were used in each sample to estimate the parameters and compute standard error estimates. For each distribution condition and sample size, 500 replications were performed. The empirical standard deviation based on 0n across the 500 replications is calculated. The actual empirical variability of the estimates across the replications is compared to the average values given by the relevant formulae (2.2), (2.4)-(2.6). The results based on ML estimation for normal, elliptical, and lognormal error data are presented in Tables 1-3, while the corresponding results based on ADF estimation are given in Tables 4 -6 . In each table, for each sample size, we give the Empirical (Emp), Predicted (Pred), and Corrected (Corr) standard error estimates. Each entry in these tables needs to be multiplied by 0.1 to yield the relevant estimate. The quality of

Page 9: Improving parameter tests in covariance structure analysis

K_-H. Yuan, P.M. Bentler / Computational Statistics & Data Analysis 26 (1997) 177-198

Table 1 Normal data, normal theory method: Empirical (Emp), Predicted (Pred), and Corrected (Corr)

185

N Standard errors (x 10)

Emp 0.823 0.789 0.802 0.838 0.838 0.838 0.787 0.806 0.789 0.818 0.777 0.770 150 Pred 0.803 0.806 0.793 0.791 0.805 0.805 0.794 0.793 0.802 0.807 0.799 0.790

Corr 0.799 0.805 0.789 0.793 0.800 0.799 0.791 0.785 0.796 0.796 0.793 0.783

Emp 0.689 0.706 0.711 0.721 0.697 0.699 0.676 0.662 0.670 0.675 0.661 0.629 200 Pred 0.697 0.700 0.690 0.689 0.695 0.695 0.686 0.682 0.695 0.698 0.690 0.682

Corr 0.696 0.701 0.685 0.689 0.689 0.691 0.681 0.677 0.688 0.691 0.686 0.679

Emp 0.554 0.536 0.558 0.579 0.583 0.565 0.565 0.560 0.561 0.571 0.571 0.519 300 Pred 0.567 0.570 0.563 0.560 0.568 0.569 0.561 0.558 0.567 0.568 0.560 0.556

Corr 0.567 0.571 0.562 0.560 0.566 0.568 0.559 0.555 0.563 0.563 0.559 0.552

Emp 0.436 0.436 0.428 0.439 0.432 0.431 0.417 0.415 0.456 0.457 0.453 0.427 500 Pred 0.439 0.441 0.435 0.433 0.440 0.441 0.435 0.433 0.440 0.440 0.434 0.430

Con. 0.439 0.440 0.434 0.434 0.440 0.441 0.435 0.431 0.438 0.437 0.434 0.430

Emp 0.308 0.317 0.296 0.303 0.308 0.308 0.308 0.301 0.318 0.327 0.321 0.306 1000 Pred 0.311 0.311 0.308 0.306 0.310 0.310 0.306 0.304 0.310 0.310 0.306 0.304

Corr 0.311 0.311 0.307 0.306 0.310 0.310 0.306 0.304 0.310 0.309 0.306 0.303

Table 2 Elliptical data, normal theory method: Empirical (Emp), Predicted (Pred), and Corrected (Corr)

N Standard errors (x 10)

Emp 1.19 1.40 1.35 1.21 1.37 1.35 1.27 1.31 1.39 1.26 1.39 1.14 150 Pred 0.825 0.825 0.823 0.805 0.826 0.830 0.825 0.816 0.828 0.826 0.820 0.808

Corr 1.09 1.12 1.11 1.08 1.15 1.14 1.14 1.12 1.13 1.10 1.11 1.06

Emp 1.11 1.09 1.07 1.08 1.22 1.15 1.13 1.12 1.04 1.11 1.00 1.03 200 Pred 0.706 0.706 0.702 0.691 0.713 0.713 0.710 0.704 0.712 0.711 0.705 0.697

Corr 0.959 0.990 0.942 0.960 0.977 0.983 0.967 0.967 0.965 0.957 0.951 0.949

Emp 0.856 0.865 0.906 0.900 0.947 0.894 0.894 0.918 0.886 0.895 0.896 0.868 300 Pred 0.574 0.573 0.569 0.561 0.578 0.579 0.574 0.569 0.575 0.574 0.566 0.562

Corr 0.800 0.812 0.793 0.798 0.811 0.816 0.797 0.794 0.819 0.810 0.803 0.800

Emp 0.641 0.679 0.697 0.670 0.712 0.705 0.688 0.674 0.727 0.700 0.717 0.694 500 Pred 0.443 0.443 0.439 0.433 0.444 0.443 0.440 0.435 0.442 0.442 0.435 0.432

Con" 0.636 0.642 0.630 0.632 0.644 0.652 0.635 0.636 0.648 0.638 0.64l 0.628

Emp 0.493 0.514 0.468 0.486 0.503 0.491 0.527 0.461 0.535 0.529 0.530 0.539 1000 Pred 0.312 0.312 0.309 0.306 0.313 0.312 0.309 0.307 0.311 0.312 0.308 0.305

Corr 0.467 0.470 0.462 0.462 0.470 0.469 0.464 0.459 0.478 0.477 0.475 0.470

the fo rmulas is j u d g e d agains t the empi r i ca l s tandard error. To m i n i m i z e use o f

j ou rna l space, we on ly repor t the s tandard errors co r r e spond ing to fac tor loadings .

The s tandard errors co r r e spond ing to fac tor va r iances and error va r iances fo l l ow a

s imi lar pattern.

Page 10: Improving parameter tests in covariance structure analysis

186 K.-H. Yuan, P.M. Bentler/ Computational Statistics & Data Analysis 26 (1997) 177-198

Table 3 Lognormal error data, normal theory method: Empirical (Emp), Predicted (Pred), and Corrected (Corr)

N Standard errors (× 10)

Emp 1.26 1.21 1.23 1.19 1.17 1.32 1.22 1.28 1 . 4 0 1 . 1 6 1 . 3 0 1.17 150 Pred 0.779 0.774 0.780 0.766 0.778 0.796 0.788 0.779 0.793 0.792 0.795 0.775

Con. 0.994 0.988 1 .01 0.955 0.988 1 . 0 4 1.02 1.00 1.03 0.989 1 .01 0.961

Emp 1 .04 0.983 0.996 0.957 1.04 1.20 1.09 1.14 1.12 1.04 1.14 1.05 200 Pred 0.666 0.661 0.661 0.652 0.675 0.687 0.683 0.672 0.685 0.688 0.683 0.672

Corr 0.855 0.854 0.851 0.818 0.887 0.932 0.903 0.892 0.898 0.881 0.893 0.857

Emp 0.891 0.864 0.967 0.908 0.819 0.894 0.849 0.839 0.834 0.778 0.895 0.771 300 Pred 0.553 0.547 0.551 0.543 0.548 0.557 0.550 0.543 0.558 0.559 0.547 0.542

Corr 0.741 0.745 0.758 0.716 0.726 0.755 0.735 0.726 0.734 0.729 0.732 0.701

Emp 0.666 0.669 0.726 0.663 0.670 0.688 0.668 0.648 0.628 0.636 0.656 0.565 500 Pred 0.432 0.429 0.430 0.423 0.433 0.436 0.432 0.425 0.431 0.429 0.420 0.416

Corr 0.594 0.592 0.603 0.578 0.596 0.604 0.589 0.589 0.572 0.575 0.570 0.549

Emp 0.542 0.495 0.434 0.497 0.467 0.464 0.457 0.502 0.471 0.489 0.484 0.470 1000 Pred 0.308 0.305 0.304 0.301 0.305 0.306 0.303 0.299 0.307 0.304 0.301 0.296

Con" 0.455 0.441 0.422 0.438 0.432 0.431 0.422 0.425 0.429 0.445 0.435 0.428

Table 4 Normal data, ADF method: Empirical (Emp), Predicted (Pred), and Corrected (Con')

N Standard errors (× 10)

Emp 1 .70 1 . 8 0 1 . 6 6 1.67 1.73 1 . 8 0 1.66 1.63 2.02 2.16 2.20 2.20 150 a Pred 0.522 0.525 0.516 0.516 0.516 0.517 0.504 0.494 0.540 0.541 0.538 0.540

Corr 1 .19 1 . 1 9 1 . 1 7 1 . 1 7 1 . 1 7 1.17 1.15 1.12 1.23 1.23 1.22 1.23

Emp 1 .10 1 . 1 0 1.10 1.08 1.17 1 . 0 9 1.16 1.15 1 . 1 0 1 . 0 9 1.09 1.11 200 b Pred 0.512 0.516 0.508 0.504 0.511 0.515 0.504 0.503 0.504 0.507 0.504 0.499

Con" 0.814 0.820 0.809 0.802 0.813 0.819 0.802 0.801 0.802 0.806 0.802 0.793

Emp 0.762 0.700 0.736 0.790 0.776 0.760 0.760 0.766 0.737 0.745 0.722 0.668 300 Pred 0.470 0.474 0.465 0.465 0.470 0.472 0.464 0.459 0.464 0.463 0.461 0.457

Con" 0.609 0.613 0.602 0.602 0.608 0.611 0.601 0.594 0.601 0.599 0.597 0.592

Emp 0.515 0.507 0.506 0.508 0.511 0.521 0.487 0.488 0.512 0.525 0.506 0.493 500 Pred 0.395 0.396 0.390 0.390 0.396 0.397 0.391 0.387 0.393 0.391 0.389 0.386

Corr 0.454 0.455 0.448 0.448 0.455 0.456 0.450 0.444 0.451 0.449 0.447 0.443

Emp 0.336 0.338 0.328 0.318 0.328 0.334 0.330 0.326 0.339 0.352 0.341 0.333 1000 Pred 0.296 0.296 0.292 0.291 0.295 0.295 0.291 0.289 0.295 0.293 0.291 0.288

Corr 0.315 0.315 0.311 0.310 0.314 0.315 0.311 0.308 0.314 0.313 0.310 0.307

aBased on 449 converged samples. bBased on 493 converged samples.

Page 11: Improving parameter tests in covariance structure analysis

K.-H. Yuan, P.M. Bentler/ Computational Statistics & Data Analysis 26 (1997) 177-198

Table 5 Elliptical data, ADF method: Empirical (Emp), Predicted (Pred), and Corrected (Corr)

187

N Standard errors ( × 1 O)

Emp 2.02 2.12 2.22 2.21 1.99 2.54 2.13 2.15 2.28 2.14 2.15 2.29 150 a Pred 0.590 0.594 0.590 0.584 0.582 0.613 0.593 0.580 0.613 0.598 0.595 0.603

Corr 1.34 1.35 1.34 1.33 1.32 1.39 1.35 1.32 1 . 3 9 1.36 1.35 1.37

Emp 1.42 1.51 1.45 1.42 1.62 1.48 1.53 1.48 1.75 1.51 1.41 1.46 200 b Pred 0.607 0.614 0.603 0.599 0.616 0.606 0.618 0.602 0.620 0.613 0.609 0.603

Corr 0.966 0.977 0.960 0.954 0.980 0.964 0.983 0.957 0.986 0.976 0.969 0.959

Emp 0.896 0.893 0.894 0.822 0.922 0.933 0.961 0.962 0.907 0.895 0.912 0.826 300 Pred 0.566 0.568 0.555 0.547 0.567 0.567 0.566 0.557 0.563 0.561 0.551 0.544

Corr 0.732 0.736 0.719 0.708 0.734 0.734 0.733 0.721 0.729 0.726 0.714 0.704

Emp 0.645 0.654 0.615 0.607 0.624 0.665 0.630 0.610 0.622 0.630 0.607 0.599 500 Pred 0.490 0.487 0.480 0.472 0.486 0.485 0.482 0.475 0.485 0.482 0.476 0.470

Corr 0.563 0.559 0.552 0.543 0.558 0.557 0.553 0.546 0.557 0.554 0.546 0.540

Emp 0.436 0.421 0.430 0.416 0.417 0.422 0.426 0.415 0.409 0.415 0.430 0.420 1000 Pred 0.376 0.373 0.367 0.365 0.375 0.374 0.371 0.368 0.372 0.373 0.368 0.363

Corr 0.402 0.397 0.392 0.389 0.400 0.399 0.396 0.393 0.397 0.398 0.393 0.387

Based on 421 converged samples. b Based on 481 converged samples.

Table 6 Lognormal error data, ADF method: Empirical (Emp), Predicted (Pred), and Corrected (Corr)

N Standard errors ( × 1 O)

Emp 1.19 1.15 1.32 1.25 1.28 1.40 1.22 1.22 1.66 1.53 1.58 1.46 150 a Pred 0.370 0.369 0.369 0.356 0.377 0.384 0.369 0.367 0.396 0.396 0.395 0.390

Corr 0.841 0.839 0.840 0.809 0.856 0.873 0.839 0.834 0.900 0.900 0.898 0.888

Emp 0.999 0.908 0.989 0.959 0.859 0.864 0.884 0.820 0.966 0.941 0.943 0.891 200 b Pred 0.413 0.403 0.398 0.394 0.400 0.401 0.395 0.393 0.399 0.399 0.396 0.389

Con" 0.657 0.642 0.633 0.627 0.636 0.638 0.628 0.626 0.634 0.634 0.630 0.619

Emp 0.612 0.628 0.627 0.575 0.628 0.645 0.642 0.652 0.632 0.673 0.652 0.585 300 Pred 0.397 0.396 0.388 0.380 0.396 0.395 0.392 0.386 0.396 0.400 0.393 0.390

Corr 0.514 0.513 0.503 0.492 0.512 0.511 0.507 0.500 0.513 0.518 0.509 0.505

Emp 0.461 0.485 0.467 0.442 0.478 0.485 0.473 0.455 0.466 0.471 0.459 0.454 500 Pred 0.363 0.361 0.358 0.349 0.363 0.360 0.358 0.355 0.359 0.359 0.354 0.352

Corr 0.417 0.414 0.411 0.401 0.416 0.414 0.411 0.407 0.412 0.413 0.407 0.404

Emp 0.322 0.340 0.347 0.330 0.321 0.335 0.360 0.333 0.340 0.338 0.347 0.333 1000 Pred 0.302 0.299 0.296 0.293 0.297 0.297 0.293 0.293 0.298 0.298 0.295 0.292

Corr 0.322 0.319 0.315 0.313 0.317 0.317 0.313 0.313 0.318 0.318 0.315 0.312

a Based on 449 converged samples. b Based on 490 converged samples.

Page 12: Improving parameter tests in covariance structure analysis

188 K.-H. Yuan, P.M. Bentler/ Computational Statistics & Data Analysis 26 (1997) 17~198

When data are distributed normally, both the normal theory predicted standard er- ror (based on the inverse of the information matrix) and the normal theory corrected standard error match the empirical standard error very well for all the sample sizes studied (Table 1 ). Even though we estimate the large matrix V by St in (2~c, this does not decrease the accuracy of the estimated standard errors. In Tables 2 and 3, when data are not normal and Anderson and Amemiya's condition for asymptotic robustness is not met, there is an obvious pattern among the three kind of standard er- rors. For all the sample sizes studied, the empirical standard errors are always bigger than the other two type of standard errors based on formulas, and the normal theory corrected standard errors outperform the normal theory predicted standard errors in describing the empirical variability of normal estimators. While there exists differ- ence between normal and nonnormal data when using the corrected standard errors to describe the empirical standard errors, when data are not normal, the corrected stan- dard errors are insensitive to the symmetry of the data (comparing Tables 2 and 3) and are a little bit off in estimating the empirical standard errors. Theoretically, the empirical and the corrected standard errors will approach the population standard error when sample size N goes to infinity. For the nonnormal data investigated here, N = 1000 is not large enough. We did not include a larger N (e.g., N = 5000 in Hu et al., 1992), because we have no doubt about the consistency of f2jc for f2~c, and for practical data sets with p = 15, people seldom have sample sizes larger than 1000.

These results based on the MLE yield the following conclusion. The corrected standard error works as well as the predicted standard error when data are nor- mal. When data are not normally distributed, both the corrected standard error and the predicted standard error underestimate the empirical one, but the corrected stan- dard error is substantially better than the predicted standard error. This is perhaps not surprising since the inverse of the normal theory information matrix yields the correct standard errors asymptotically only when the normality assumption is true. These results also suggest that until a better way to estimate the standard errors of normal theory estimators can be found, we should use the standard errors based o n ~'~ I c-

Results for the ADF simulation compare the empirical variability with variability given by ~2 and ~2c. Unlike the relation between ~1 and ~lc,~both O2 and f22~ are asymptotically correct for estimating the covariance of v /~(0n- 00). However, since p * = 120 for the factor analysis model used in our simulation, the constants n/(n-p*-1) that differentiate the two estimators approximately are 5, 2.5, 1.67, 1.32, 1.14 for sample size 150, 200, 300, 500, 1000. As in the normal theory method, for each condition and sample size, 500 replications were performed, and 0n was ob- tained for each converged replication. Also obtained were the standard errors based on f22 and O2~. The empirical standard errors based on all converged samples in each case were calculated, as were the averaged ADF predicted and corrected standard er- rors. These results are given in Tables 4-6 , where only standard errors corresponding to factor loadings are presented due to space limitations.

An immediate conclusion from Tables 4 - 6 is that the ADF predicted standard errors, based on asymptotic theory, are much too small when compared with the

Page 13: Improving parameter tests in covariance structure analysis

K.-H. Yuan, P.M. Bentlerl Computational Statistics & Data Analysis 26 (1997) 177-198 189

empirical standard errors when sample sizes are small to medium. Even though our new corrected standard error still underestimates the empirical variability of 0n, it is more accurate than the classical formula which can be quite misleading. As sample sizes get larger, the ADF predicted standard errors match the empirical ones better, and the effect of our correction becomes smaller. This indicates the consistency of the three types of standard errors. Since for all sample sizes and all distribution conditions studied here, the corrected standard errors are always better than the predicted ones in estimating the empirical variability of 0n, we recommend using the corrected standard errors in practice.

In order to gain some generality to the simulation, another study was done. This study also used a factor model as in (1.1). We took p = 8, with the factor loading matrix given by

0.2 0.4 0.6 0.8 0 0 0 0 \ A = 0.1 0 0 0 0.3 0.5 0.7 0.8 ") (3.2)

The two factors are correlated with cov(fl , f2) -- 0.5 and var(fl ) = var(f2) = 1. The error variance matrix 7 j is diagonal so that the population covariance matrix 2; is actually a correlation matrix. The three conditions for generating observed data as used for model (3.1) are also used for model (3.2). For model identification, we fix • ~41 and 282 in the estimation process. With sample sizes 100, 150, 200, 300, 500, 10000 and 500 simulation replications, the standard errors are given, respectively, in Tables 7-9 for the MLE and Tables 10-12 for the ADF estimators. Even though the structure in (3.2) is different from that in (3.1) and the loadings are also more dispersed, the standard errors in Tables 7-12 follow a similar pattern as those in Tables 1-6. The only noticeable difference is that the corrected standard errors in Tables 7-12 are not as dramatically improved over the uncorrected ones as in the previous study, though there is still always an improvement. These results match our anticipations. Since our proposal for the corrected standard errors is based on consid- ering the statistical theories behind ML and ADF methods for all structural models as discussed in Section 2, rather than just based on a set of limited simulations, a similar pattern of standard errors should carry over to other model structures as well.

When data are estimator 0n and observe this fact in Tables 4 and

elliptically symmetric, Browne (1984) proved that the normal theory the ADF estimator 0, have the same asymptotic efficiency. We can by comparing the numbers in Tables 1 and 2 and 7 and 8 to those 5 and 10 and 11, respectively. Even though when N gets larger,

the efficiency of 0n and 0n become more similar for normal and elliptical data, and for small to medium N, 0, is much more efficient than 0n. This observation will be further discussed in Section 4. When data are symmetric, Browne (1984) also gave a formula for the asymptotic covariance ~2~ of the normal theory estimator 0n based on an estimator of Mardia's multivariate kurtosis. We did not compute the standard error computed from ~ here, since real data sets are seldom symmetric.

Page 14: Improving parameter tests in covariance structure analysis

190 K.-H. Yuan, P.M. Bentler I Computational Statistics & Data Analysis 26 (1997) 177-198

Table 7 Normal data, normal theory method: Empirical (Emp), Predicted (Pred), and Corrected (Corr)

N Standard errors ( x 10)

Emp 1.59 1.41 1.78 1.53 1.10 1.18 1.32 100 Pred 1.55 1.33 1.62 1.38 1.11 1.13 1.23

Corr 1.53 1.30 1.58 1.35 1.08 1.12 1.21

Emp 1.22 1.15 1.45 1.16 0.919 0.948 1.07 150 Pred 1.25 1.08 1.30 1.12 0.901 0.919 1.00

Corr 1.24 1.07 1.28 1.10 0.899 0.907 0.989

Emp 1.09 0.930 1.17 1.03 0.767 0.804 0.825 200 Pred 1.07 0.934 1.12 0.959 0.781 0.797 0.871

Corr 1.07 0.922 1.10 0.952 0.774 0.788 0.866

Emp 0.844 0.773 0.919 0.830 0.631 0.657 0.709 300 Pred 0.869 0.757 0.902 0.784 0.637 0.652 0.710

Corr 0.867 0.750 0.893 0.783 0.634 0.648 0.705

Emp 0.678 0.572 0.697 0.671 0.490 0.506 0.518 500 Pred 0.671 0.585 0.697 0.604 0.491 0.504 0.547

Con" 0.670 0.582 0.692 0.603 0.489 0.502 0.544

Emp 0.470 0.422 0.501 0.463 0.345 0.365 0.364 1000 Pred 0.471 0.413 0.492 0.423 0.345 0.355 0.384

Corr 0.471 0.412 0.490 0.423 0.345 0.354 0.383

Table 8 Elliptical data, normal theory method: Empirical (Emp), Predicted (Pred), and Corrected (Corr)

N Standard errors ( x 10)

Emp 2.62 2.49 3.01 2.09 1.71 1.82 2.25 100 Pred 1.68 1.39 1.94 1.43 1.11 1.13 1.26

Corr 2.10 1.82 2.58 1.72 1.43 1.45 1.63

Emp 1.91 1.88 2.28 1.66 1.42 1.47 1.71 150 Pred 1.31 1.12 1.37 1.14 0.901 0.918 1.01

Corr 1.70 1.51 1.82 1.44 1.21 1.21 1.33

Emp 1.70 1.53 1.98 1.46 1.22 1.23 1.38 200 Pred 1.12 0.958 1.17 0.979 0.781 0.798 0.884

Corr 1.47 1.31 1.60 1.28 1.07 1.08 1.20

Emp 1.30 1.22 1.49 1.22 1.00 1.03 1.14 300 Pred 0.899 0.771 0.920 0.804 0.641 0.654 0.718

Corr 1.23 1.09 1.28 1.09 0.906 0.906 1.01

Emp 1.08 0.938 1.16 0.979 0.728 0.771 0.841 500 Pred 0.681 0.593 0.709 0.608 0.492 0.506 0.551

Corr 0.971 0.852 1.03 0.864 0.707 0.712 0.789

Emp 0.782 0.725 0.888 0.699 0.549 0.606 0.615 1000 Pred 0.474 0.417 0.497 0.424 0.346 0.356 0.386

Corr 0.718 0.632 0.758 0.642 0.522 0.538 0.593

Page 15: Improving parameter tests in covariance structure analysis

K.-H. Yuan, P.M. Bentler/Computational Statistics & Data Analysis 26 (1997) 17~198 191

Table 9 Lognormal error data, normal theory method: Empirical (Emp), Predicted (Pred), and Corrected (Corr)

N Standard errors (x 10)

Emp 2.05 1.72 2.15 1.77 1.55 1.86 1.75 100 Pred 1.49 1.19 1.49 1.30 1.03 1.04 1.09

Corr 1.68 1.39 1.74 1.47 1.23 1.25 1.31

Emp 1.96 1.39 2.39 1.52 1.33 1.64 1.64 150 Pred 1.26 0.990 1.32 1.10 0.864 0.864 0.942

Corr 1.61 1.19 1.66 1.35 1.09 1.12 1.24

Emp 1.84 1.28 1.67 1.53 1.31 1.73 1.34 200 Pred 1.01 0.866 1.02 0.929 0.757 0.751 0.820

Corr 1.27 1.05 1.29 1.15 0.971 0.981 1.07

Emp 1.53 1.07 1.47 1.28 0.942 1.22 1.05 300 Pred 0.854 0.720 0.864 0.764 0.622 0.619 0.675

Corr 1.11 0.908 1.11 0.991 0.816 0.827 0.901

Emp 1.12 0.920 0.984 0.996 0.652 0.769 0.762 500 Pred 0.648 0.563 0.661 0.584 0.480 0.485 0.524

Corr 0.865 0.748 0.864 0.786 0.629 0.642 0.708

Emp 0.724 0.632 0.743 0.665 0.528 0.593 0.633 1000 Pred 0.457 0.405 0.475 0.415 0.340 0.347 0.376

Corr 0.636 0.564 0.652 0.583 0.473 0.494 0.545

Table 10 Normal data, ADF method: Empirical (Emp), Predicted (Pred), and Corrected (Corr)

N Standard errors (x 10)

Emp 2.06 1.84 2.30 1.81 1.36 1.82 1.85 100 Pred 1.41 1.15 1.38 1.29 0.961 1.02 1.09

Corr 1.78 1.45 1.75 1.63 1.21 1.29 1.39

Emp 1.45 1.23 1.62 1.28 1.09 1.07 1.21 150 Pred 1.15 0.971 1.17 1.02 0.804 0.822 0.896

Corr 1.33 1.12 1.35 1.18 0.928 0.948 1.03

Emp 1.25 1.09 1.25 1.07 0.823 0.889 0.995 200 Pred 1.01 0.859 1.02 0.907 0.715 0.729 0.795

Corr 1.12 0.952 1.13 1.00 0.793 0.808 0.882

Emp 0.924 0.789 0.970 0.833 0.667 0.674 0.727 300 Pred 0.826 0.713 0.853 0.742 0.601 0.613 0.667

Corr 0.882 0.762 0.911 0.793 0.642 0.654 0.712

Emp 0.699 0.582 0.712 0.625 0.525 0.505 0.574 500 Pred 0.651 0.564 0.673 0.584 0.473 0.484 0.527

Corr 0.677 0.586 0.699 0.607 0.492 0.504 0.548

Emp 0.472 0.385 0.470 0.438 0.350 0.357 0.378 1000 Pred 0.463 0.405 0.484 0.417 0.341 0.349 0.381

Corr 0.472 0.413 0.493 0.425 0.348 0.355 0.388

Page 16: Improving parameter tests in covariance structure analysis

192 K.-H. Yuan, P.M. Bentler/ Computational Statistics & Data Analysis 26 (1997) 177-198

Table 11 Elliptical data, ADF method: Empirical (Emp), Predicted (Pred), and Corrected (Corr)

N Standard errors ( x 10)

Emp 3.10 3.15 3.82 2.43 1.89 2.22 2.78 100 Pred 1.91 1.46 1.85 1.67 1.12 1.16 1.29

Corr 2.42 1.84 2.34 2.11 1.41 1.47 1.63

Emp 1.90 1.71 2.11 1.71 1.35 1.37 1.63 150 Pred 1.50 1.19 1.44 1.30 0.960 0.982 1.07

Corr 1.73 1.37 1.66 1.50 1.11 1.13 1.23

Emp 1.68 1.50 1.72 1.41 1.17 1.14 1.34 200 Pred 1.28 1.06 1.26 1.15 0.869 0.898 0.976

Corr 1.42 1.17 1.39 1.28 0.963 0.996 1.08

Emp 1.16 1.07 1.27 1.03 0.874 0.852 1.00 300 Pred 1.06 0.886 1.07 0.951 0.753 0.769 0.826

Corr 1.13 0.947 1.14 1.02 0.804 0.821 0.882

Emp 0.952 0.815 0.991 0.831 0.660 0.660 0.763 500 Pred 0.856 0.725 0.867 0.764 0.613 0.628 0.679

Corr 0.889 0.753 0.902 0.794 0.637 0.653 0.706

Emp 0.672 0.584 0.681 0.631 0.471 0.497 0.535 1000 Pred 0.622 0.542 0.651 0.560 0.458 0.472 0.512

Corr 0.634 0.552 0.664 0.571 0.467 0.481 0.521

Table 12 Lognormal error data, ADF method: Empirical (Emp), Predicted (Pred), and Corrected (Corr)

N Standard errors (× 10)

Emp 1.70 1.56 1.84 1.59 1.18 1.35 1.33 100 Pred 1.12 0.903 1.07 1.04 0.752 0.773 0.806

Corr 1.42 1.14 1.35 1.32 0.951 0.976 1.02

Emp 1.20 1.14 1.46 1.14 0.971 0.915 1.02 150 Pred 0.954 0.824 0.989 0.882 0.682 0.686 0.721

Corr 1.10 0.950 1.14 1.02 0.786 0.791 0.832

Emp 1.12 1.04 1.20 1.01 0.762 0.783 0.932 200 Pred 0.893 0.760 0.916 0.822 0.624 0.648 0.694

Corr 0.990 0.843 1.01 0.911 0.691 0.718 0.769

Emp 0.854 0.829 0.941 0.818 0.630 0.659 0.701 300 Pred 0.760 0.668 0.794 0.700 0.566 0.576 0.600

Corr 0.812 0.713 0.848 0.748 0.605 0.615 0.641

Emp 0.704 0.626 0.728 0.656 0.553 0.554 0.586 500 Pred 0.654 0.564 0.669 0.596 0.481 0.487 0.513

Corr 0.680 0.587 0.695 0.619 0.499 0.506 0.533

Emp 0.532 0.456 0.546 0.506 0.392 0.400 0.442 1000 Pred 0.508 0.447 0.532 0.464 0.379 0.382 0.406

Corr 0.517 0.456 0.543 0.473 0.386 0.390 0.413

Page 17: Improving parameter tests in covariance structure analysis

K.-H. Yuan, P.M. Bentler/ Computational Statistics & Data Analysis 26 (1997) 177-198 193

4. Finite sample bias and accuracy

Since both ML and ADF methods can be used for a given data set, an interesting question is which method should be recommended in practice. Judging by asymptotic efficiency, the ADF method is as good as the ML method when data are symmetric, and is generally better than ML for arbitrarily distributed data sets. The finite sample efficiency implied in Section 3, however, tells a somewhat different story. Thus, we will compare the advantages of each method based on our empirical results. Even

though 0,, is consistent for 00, Browne (1984) reported a finite sample bias in 0n.

There may also exist finite sample bias in 0n as well. In a small model, Henly (1993) found that a minimum sample size of 600 was needed before the ML and ADF methods yielded reliable parameter estimates, though this minimum could go to 1200 for nonnormal data and even higher for the ADF method.

We shall report the bias as well as the MSE associated with the empirical studies in the last section. They are defined by

Bias = (0 - 00 ) ' ( 0 - 00),

and

(4.1)

MSE = - - ( 0 . ) - 0o) ' (0 . ) - 0o), (4.2) n c i = 1

respectively, where nc is the number of converged samples among the 500 repli- cations; 0~i) is the ith of the nc converged solutions; 0 is the sample mean of the

^

estimated 0,). Each of the nc for normal theory estimator is 500, the n'cS for ADF es- timators in the factor model (3.1) are indicated in Tables 4-6. The relative bias and MSE can be obtained if (4.1) and (4.2) are divided by 0~00. We did not compute these values because the interest here is restricted to compare the relative advan- tage of ML and ADF methods.

As for the standard errors, the bias and MSE corresponding to model (3.1) and (3.2) convey basically the same information, so we only report those corresponding to model (3.1) here. With respect to bias, there exist differences between the estimators for all the 33 parameters and those for only the 12 factor loadings. Hence, we present these results separately in Tables 13 and 14, respectively. For estimators of the 33 parameters, MLE is much less biased than ADF estimators for all distributions and sample sizes. This confirms Chou and Bentler (1995). With respect to the estimators of the 12 factor loadings only, the normal theory estimator has an advantage only when the sample size is small. This is especially true for asymmetric data. Even when data are normal, the ADF estimator can compete with the MLE when the sample size is very large. The comparison of Tables 13 and 14 indicates that the ADF estimators of factor and error variances may have large bias.

Even though an estimator has a large bias, it is still preferred if it has a smaller MSE, as has been suggested for ridge regression and some Bayes estimators. As for the bias, we computed the MSE for the estimators of all the parameters and the 12 factor loadings separately. The results are presented in Tables 15 and 16,

Page 18: Improving parameter tests in covariance structure analysis

194 K.-H. Yuan, P.M. Bentler/ Computational Statistics & Data Analysis 26 (1997) 177-198

Table 13 Empirical bias of all parameters (x 102)

Distribution condition Method Sample size

150 200 300 500 1000

Normal data MLE 0.100 0.056 0.032 0.018 0.005 ADF 13.3 6.23 2.66 0.993 0.258

Elliptical data MLE 0.351 0.225 0.157 0.067 0.020 ADF 143 112 73.7 42.8 20.2

Asymmetric data MLE 0.696 0.495 0.417 0.233 0.128 ADF 220 182 134 88.6 48.7

Table 14 Empirical bias of factor loadings (x 10 4)

Distribution condition Method Sample size

150 200 300 500 1000

Normal data MLE 1.23 0.816 0.492 0.415 0.174 ADF 49.7 3.02 0.858 0.431 0.171

Elliptical data MLE 14.2 3.90 3.61 1.99 0.832 ADF 41.7 27.1 4.84 1.24 0.575

Asymmetric data MLE 8.99 5.51 3.52 2.19 0.492 ADF 26.5 5.63 2.49 0.747 0.236

Table 15 Empirical MSE of all parameters (× 10)

Distribution condition Method Sample size

150 200 300 500 1000

Normal data MLE 2.64 1.93 1.28 0.802 0.404 ADF 10.7 4.90 2.31 1.12 0.478

Elliptical data MLE 6.71 5.44 3.88 2.19 1.31 ADF 22.8 l 5.7 9.44 5.43 2.62

Asymmetric data MLE 40.4 28.6 19.5 11.3 6.12 ADF 27.2 21.3 15.5 10.5 6.00

r e spec t ive ly . C o n s i d e r i n g e s t ima t i on o f al l the pa rame te r s , even t h o u g h A D F e s t ima -

tors have l a rge r b ias , w h e n da ta are a s y m m e t r i c , t hey have u n i f o r m l y sma l l e r M S E

than those o f n o r m a l t h e o r y M L E , as is i nd i ca t ed in the las t two rows o f Tab le 15.

F o r s y m m e t r i c data , n o r m a l t heo ry e s t ima to r s for al l the p a r a m e t e r s are u n i f o r m l y

bet ter . W i t h r e spec t to fac tor load ings , the M L E is u n i f o r m l y be t t e r for n o r m a l da ta

Page 19: Improving parameter tests in covariance structure analysis

K.-H. Yuan, P.M. Bentlerl Computational Statistics & Data Analysis 26 (1997) 177-198

Table 16 Empirical MSE of factor loadings (x 102)

195

Distribution condition Method Sample size

150 200 300 500 1000

Normal data MLE 7.82 5.61 3.77 2.28 1.16 ADF 42.3 14.8 6.65 3.09 1.34

Elliptical data MLE 20.6 14.4 9.63 5.77 3.09 ADF 58.0 27.5 9.83 4.71 2.14

Asymmetric data MLE 18.6 13.7 8.93 5.22 2.79 ADF 22.6 10.2 4.79 2.62 1.37

for all the sample sizes, and better for elliptical data for small to medium sample sizes, and better for asymmetric data for only the smallest sample size.

In a factor analysis model like (1.1), the parameters of real interest are the factor loadings. When data are not too nonnormal and not too skewed, the ML should gen- erally be preferred to ADF, especially for small to medium sample sizes. For skewed data with large sample sizes, the ADF method should be preferred. In practice, in co- variance structure analysis, data are skewed and of small to medium sample size. In such a case, we recommend trying the ADF method first if it will yield a converged solution, since the estimator is generally more accurate. If ADF method cannot yield a converged solution, as often happens when a sample size is too small, the ML can be used as well since it also gives reliable solutions. Of course, when evidence indicates that the data are from a normal population, a normal theory method is always preferred. The ML method also would be preferred if a transformation of the observed nonnormal variables to normality were available and a reasonable model could be built on the transformed variables. Not much research has been directed towards such an approach, though there is a pioneering paper by Mooijaart (1993). In practice, the range of available transformation techniques, especially in the mul- tivariate case, is very limited, and more statistical development is needed before structural modeling can be routinely used for transformed data.

5. Conclusion and discussion

This paper recommends using corrected standard errors in computing z-statistics associated with MLE and ADF estimators. Empirical evidence indicates that the corrected standard errors work as well as, and generally better than, those used in most statistical software that provides these two estimation methods. In data analysis, few data sets are really normal (e.g., Micceri, 1989). As noted by Breckler (1990), practitioners generally do not bother to evaluate the distribution of the data set and simply accept a standard ML program default. A way to minimize errors of inference resulting from this improper practice is to use some type of appropriate Z 2 t e s t

statistic to evaluate the model (see Bentler and Dudgeon, 1996 for a review of

Page 20: Improving parameter tests in covariance structure analysis

196 K.-H. Yuan, P.M. Bentler/ Computational Statistics & Data Analysis 26 (1997) 177-198

alternatives), and also to print out the standard error of 0n based o n ~lc instead of that based on f)l, as was supported by our findings in Section 3. LISREL (J6reskog and S6rbom, 1993) does not provide the corrected sandwich estimator ~)lc, while EQS (Bentler, 1995) does. In the case of ADF estimation, neither program computes ~2c. However, this can be computed by hand as a simple correction to standard output before it is implemented in these or related software packages.

Even though these corrected standard errors improve on the classical ones, the corrected standard errors still underestimate the empirical variability of the corre- sponding estimators, especially for small sample sizes. As a result, as emphasized by Curran (1994), practitioners need to be careful when using a z-statistic to de- cide on the significance of a parameter estimator. Applied to the data analysis from Harlow et al. (1995) described in the introduction, it is clear from Tables 2, 3, 8 and 9 for the MLE and Tables 5, 6, 11 and 12 for ADF that the standard z-tests were based on overly optimistic (small) estimates of sampling variability for a sample size equal to 213. A simple rule of thumb that we recommend to minimize this problem is to move the usual significance level from 0.05 to 0.01. This proposal is given only for considering the overly optimistic standard errors even after our corrections; we do not intend to connect this proposal with the interesting topic of multiple tests. Of course, any criterion to eliminate parameters to obtain a more parsimonious model must also be motivated by a meaningful interpretation of the data.

Further improvements over the corrected standard errors is needed that will work asymptotically as well as for small samples. Since the estimators based on the sandwich-type covariance matrix work very well for normal data, it is hard to get any further improvement over those based on ~ 1~ without considering the specific distri- bution property of a data set. It is likely that ~2~ can be improved by (22c = c~,(22~, where c~n > 1. For example, by considering the number q of estimated parameters in 6-(0), we can choose an = N p / ( N p - q ) , but this an cannot improve ~)2c much over ~)2c. Another possible estimator for V ~ could be based on empirical Bayes estimators. For example, Efron and Morris (1976), and Haft (1979) suggested a form

~ - 1 = c~S~l + ill ,

where ~ and/~ can be estimated from the observed ~. In estimating V -~, the MSE of I 7"-1 is smaller than that of (n - p * - 1 ) / n S ~ 1 when the Yi is a sample from a nor-

^-l mal population. However, since using V in the estimation process would change the well-known ADF estimator 0n, we hesitate to recommend using /?-i in the es- timation of standard errors before getting enough experience with it. Any further improvement over the corrected standard errors needs to be justified by statistical theory as well as by support from empirical simulations.

Acknowledgements

This work was supported by National Institute on Drug Abuse Grants DA01070 and DA00017. The detailed advice of an associate editor and two referees is grate- fully acknowledged.

Page 21: Improving parameter tests in covariance structure analysis

K.-H. Yuan, P.M. Bentler / Computational Statistics & Data Analysis 26 (1997) 177-198 197

References

Amemiya, Y., Anderson, T.W., 1990. Asymptotic chi-square tests for a large class of factor analysis models. Ann. Statist. 18, 1453-1463.

Anderson, T.W., 1989. Linear latent variable models and covariance structures. J. Econometrics 41, 91-119.

Anderson, T.W., Amemiya, Y., 1988. The asymptotic normal distribution of estimators in factor analysis under general conditions. Ann. Statist. 16, 759-771.

Arminger, G., Schoenberg, R.J., 1989. Pseudo maximum likelihood estimation and a test for misspecification in mean and covariance structure models. Psychometrika 54, 409-425.

Austin, J.T., Calderrn, R.F., 1996. Theoretical and technical contributions to structural equation modeling: an updated annotated bibliography. Structural Equation Modeling 3, 105-175.

Bentler, P.M., 1983. Some contributions to efficient statistics for structural models: specification and estimation of moment structures. Psychometrika 48, 493-517.

Bentler, P.M., 1995. EQS Structural Equations Program Manual. Multivariate Sottware, Encino, CA. Bentler, P.M., Dijkstra, T., 1985. Efficient estimation via linearization in structural models. In:

Krishnaiah, P.R. (Ed.), Multivariate Analysis VI. North-Holland, Amsterdam, pp. 9-42. Bentler, P.M., Dudgeon, P., 1996. Covariance structure analysis: statistical practice, theory, and

directions. Ann. Rev. Psychol. 47, 563-592. Bentler, P.M., Wu, E.J.C., 1995. EQS for Windows User's Guide. Multivariate Software, Encino, CA. Bollen, K.A., 1996. An alternative two stage least squares (2SLS) estimator for latent variable equations.

Psychometrika 61, 109-121. Bollen, K.A., Stine, R., 1990. Direct and indirect effects: classical and bootstrap estimates of variability.

In: Clogg, C.C. (Ed.), Sociological Methodology 1990. Basil, Blackwell, Oxford, pp.l15-140. Boomsma, A., 1986. On the use of bootstrap and jackknife in covariance structure analysis. Compstat

1986, 205-210. Breckler, S.J., 1990. Applications of covariance structure modeling in psychology: cause for concern?

Psychological Bull. 107, 260-273. Browne, M.W., 1984. Asymptotically distribution-free methods for the analysis of covariance structures.

British J. Math. Statist. Psychol. 37, 62-83. Browne, M.W., Arminger, G., 1995. Specification and estimation of mean and covarianee models. In:

Arrninger, G., Clogg, C.C., Sobel, M.E. (Eds.), Handbook of Statistical Modeling for the Social and Behavioral Sciences. Plenum, New York, pp. 185-249.

Chamberlain, G., 1982. Multivariate regression models for panel data. J. Econometrics 18, 5-46. Chatterjee, S., 1984. Variance estimation in factor analysis: an application of the bootstrap. British J.

Math. Statist. Psychol. 37, 252-262. Chou, C.-P., Bentler, P.M., 1995. Estimates and tests in structural equation modeling. In: Hoyle R.

(Ed.), Structural Equation Modeling: Concepts, Issues, and Applications. Sage, Thousand Oaks, CA, 37-75.

Chou, C.-P., Bentler, P.M., Satorra, A., 1991. Scaled test statistics and robust standard errors for nonnormal data in covariance structure analysis: a Monte Carlo study. British J. Math. Statist. Psychol. 44, 347-357.

Curran, P.J., 1994. The robustness of confirmatory factor analysis to model misspecification and violations of normality. Ph.D. Thesis, Arizona State University.

Curran, P.J., West, S.G., Finch, J.F., 1996. The robustness of test statistics to nonnormality and specification error in confirmatory factor analysis. Psychol. Meth. 1, 16-29.

Efron, B., Morris, C., 1976. Multivariate empirical Bayes and estimation of covariance matrices. Ann. Statist. 4, 22-32.

Gerbing, D.W., Anderson, J.C., 1985. The effects of sampling error and model characteristics on parameter estimation for maximum likelihood confirmatory factor analysis. Multivar. Behavioral Res. 20, 255-271.

Haft, L.R., 1979. Estimation of the inverse covariance matrix: random mixtures of the inverse Wishart matrix and the identity. Ann. Statist. 7, 1264-1276.

Page 22: Improving parameter tests in covariance structure analysis

198 K.-H. Yuan, P.M. Bentler/ Computational Statistics & Data Analysis 26 (1997) 177-198

Harlow, L.L., Stein, J.A., Rose, J.S., 1995. Substance abuse and risky sexual behavior in women: a longitudinal stage model. Manuscript under review (based on NIMH Grant MH47233).

Henly, S.J., 1993. Robustness of some estimators for the analysis of covariance structures. British J. Math. Statist. Psychol. 46, 313-338.

Hoyle, R. (Ed), 1995. Structural Equation Modeling: Concepts, Issues, and Applications. Sage, Thousand Oaks, CA.

Hu, L., Bentler, P.M., Kano, Y., 1992. Can test statistics in covariance structure analysis be trusted? Psychol. Bull. 112, 351-362.

Ichikawa, M., Konishi, S., 1995. Application of the bootstrap methods in factor analysis. Psychometrika 60, 77-93.

J6reskog, K.G., S6rbom, D., 1993. LISREL 8 User's Reference Guide. Scientific Software International, Chicago.

Magnus, J.R., Neudecker, H., 1988. Matrix Differential Calculous with Applications in Statistics and Econometrics. Wiley, New York.

Marcoulides, G.A., Schumacker, R.E. (Eds.), 1996. Advanced structural equation modeling: issues and techniques. Lawrence Erlbaum Associates, Mahwah, NJ.

Micceri, T., 1989. The unicorn, the normal curve, and other improbable creatures. Psychol. Bull. 105, 156-166.

Mooijaart, A., 1993. Structural equation models with transformed variables. In: Haagen, K., Bartholomew, D., Deistler, M. (Eds.), Statistical Modelling and Latent Variables. North-Holland, Amsterdam, pp. 249-258.

Satorra, A., Neudecker, H., 1994. On the asymptotic optimality of alternative minimum-distance estimators in linear latent-variable models. Econometric Theory 10, 867-883.

Tremblay, P.F., Gardner, R.C., 1996. On the growth of structural equation modeling in psychological journals. Struct. Equation Model. 3, 93-104.

Yuan, K.-H., Bentler, P.M., 1997. Mean and covariance structure analysis: theoretical and practical improvements. J. Amer. Statist. Assoc. 92, 766-773.

Yung, Y.-F., Bentler, P.M., 1996. Bootstrap techniques in analysis of mean and covariance structures. In: Marcoulides, G.A., Schumacker, R.A. (Eds.), Advanced Structural Equation Modeling, Issues and Techniques. Erlbaum, Mahwah, NJ, pp. 195-226.