Xitao Fan, Ph.D. Chair Professor & Dean Faculty of Education University of Macau Designing Monte Carlo Simulation Studies
Xitao Fan, Ph.D.
Chair Professor & Dean
Faculty of Education
University of Macau
Designing Monte Carlo Simulation Studies
Getting Involved in Monte Carlo Simulation
Fan, X., Felsovalyi, A., Sivo, S. A., & Keenan, S. (2002) SAS for Monte Carlo studies: A guide for quantitative researchers. Cary, NC: SAS Institute, Inc.
Fan, X. (2012). Designing simulation studies. In H. Cooper (Ed.), Handbook of Research Methods in Psychology, Vol. 2 (pp. 427-444). Washington, DC: American Psychological Association.
Getting Involved in Monte Carlo Simulation
Peugh, J., & Fan, X. (In press). Enumeration index performance in generalized growth mixture models: a Monte Carlo test of Muthén’s (2003) hypothesis. Structural Equation Modeling.
Peugh, J., & Fan, X. (In press). Modeling unobserved heterogeneity using latent profile
analysis: A Monte Carlo simulation. Structural Equation Modeling. Peugh, J., & Fan, X. (2012). How well does growth mixture modeling identify
heterogeneous growth trajectories? A simulation study examining GMM’s performance characteristics. Structural Equation Modeling, (19), 204-226.
Fan, X., & Sivo, S. A. (2009). Using goodness-of-fit indices in assessing mean structure
invariance. Structural Equation Modeling, 16, 1-16. Fan, X. & Sivo, S. (2007). Sensitivity of fit indices to model misspecification and model
types. Multivariate Behavioral Research, 42, 509-529. Sivo, S. A., Fan, X., Witta, E. L., & Willse, J. T. (2006). The search for "optimal" cutoff
properties: Fit index criteria in structural equation modeling. Journal of Experimental Education, 74, 267-288.
Getting Involved in Monte Carlo Simulation
Fan, Xitao, & Fan, Xiaotao. (2005). Power of latent growth modeling for detecting linear growth: Number of measurements and comparison with other analytic approaches. Journal of Experimental Education, 73, 121-139.
Fan, X., & Sivo, S. A. (2005). Sensitivity of fit indices to misspecified structural or
measurement model components: Rationale of two-index strategy revisited. Structural Equation Modeling, 12, 343-367.
Fan, Xitao, & Fan, Xiaotao. (2005). Using SAS for Monte Carlo simulation research in
structural equation modeling. Structural Equation Modeling, 12, 299-333. Sivo, S., Fan, X., & Witta, L. (2005). The biasing effects of unmodeled ARMA time series
processes on latent growth curve model estimates. Structural Equation Modeling, 12, 215-231.
Fan, X. (2003). Two Approaches for Correcting Correlation Attenuation Caused by
Measurement Error: Implications for Research Practice. Educational and Psychological Measurement, 63, 6, 915-930.
Fan, X. (2003). Power of latent growth modeling for detecting group differences in linear
growth trajectory parameters. Structural Equation Modeling, 10, 380-400.
Getting Involved in Monte Carlo Simulation
Yin, P., & Fan, X. (2001). Estimating R2 shrinkage in multiple regression: A comparison of different analytical methods. Journal of Experimental Education, 69, 203-224.
Fan, X., & Wang, L. (1999). Comparing logistic regression with linear discriminant analysis
in their classification accuracy. Journal of Experimental Education, 67, 265-286. Fan, X., Thompson, B, & Wang, L. (1999). The effects of sample size, estimation methods,
and model specification on SEM fit indices. Structural Equation Modeling: A Multidisciplinary Journal, 6, 56-83.
Fan, X., & Wang, L. (1998). Effects of potential confounding factors on fit indices and
parameter estimates for true and misspecified SEM models. Educational and Psychological Measurement, 58, 699-733.
Fan, X. & Wang, L. (1996). Comparability of jackknife and bootstrap results: An
investigation for a case of canonical analysis. Journal of Experimental Education, 64, 173-189.
What Is a Monte Carlo Simulation Study?
“the use of random sampling techniques and often the use of computer
simulation to obtain approximate solutions to mathematical or physical
problems especially in terms of a range of values each of which has a
calculated probability of being the solution” (Merriam-Webster On-Line).
An empirical alternative to a theoretical approach (i.e., a solution based on
statistical/mathematical theory)
Increasingly possible because of the advances in computing technology
Situations Where Simulation Is Useful
Consequences of Assumption Violations
Statistical Theory: stipulates what the condition should be, but does not say what
the reality would be if the conditions were not satisfied in the data
Understanding a Sample Statistic That May Not Have Theoretical
Distribution
● Many Other Situations
Retaining the optimal number of factors in EFA
Evaluating the performance of mixture modeling in identifying the latent groups
Assessing the consequences of failure to model correlated error structure in latent
growth modeling
Basic Steps in a Simulation Study
Asking Questions Suitable for a Simulation Study
Questions for which no (no trustworthy) analytical/theoretical solutions
Simulation Study Design (Example)
Include / manipulate the major factors that potentially affect the outcome
Data Generation
Sample data generation & transformation
Analysis (Model Fitting) for Sample Data
Accumulation and Analysis of the Statistic(s) of Interest
Presentation and Drawing Conclusions
Conclusions limited to the design conditions
An Example: Independent t-test (group variance homogeneity)
An Example: Independent t-test (group variance homogeneity)
Data Generation in a Simulation Study
Common Random Number Generators
* binomial, Cauchy, exponential, gamma, Poisson, normal, uniform, etc.
* All distributions are based on uniform distribution
Simulating Univariate Sample Data
* Normally-Distributed Sample Data (N ~ , 2)
* Non-Normal Distribution: Fleishman (1978):
a, b, c, d: coefficients needed for transforming the unit normal variate to a non-normal variable with specified degrees of population skewness and kurtosis.
Fleishman, A. I. (1978). A method for simulating non-normal distributions. Psychometrika, 43, 521-531.
Data Generation in a Simulation Study
Sample Data from a Multivariate Normal Distribution
* matrix decomposition procedure (Kaiser & Dickman, 1962):
F: k k matrix containing principal component factor pattern coefficients obtained by applying principal component factorization to the given population inter-correlation matrix R;
Sample Data from a Multivariate Non-Normal Distribution
* Interaction between non-normality and inter-variable correlations
* Intermediate correlations using Fleishman coefficients (Vale & Maurelli, 1983)
* Matrix decomposition procedure applied to intermediate correlation matrix
Kaiser, H. F., & Dickman, K. (1962). Sample and population score matrices and sample correlation matrices from an arbitrary population correlation matrix. Psychometrika, 27, 179-182
Vale, C. D., & Maurelli, V. A. (1983). Simulating multivariate nonnormal distributions. Psychometrika, 48, 465-471.
Checking the Validity of Data Generation Procedures
Example: Multivariate non-normal sample data (three correlated variables)
From Simulation Design to Population Data Parameters It may take much effort to obtain population parameters – t-test example
From Simulation Design to Population Data Parameters Latent growth model example
From Simulation Design to Population Data Parameters Latent growth model example
Accumulation and Analysis of the Statistic(s) of Interest
Accumulation: Straightforward or Complicated * Typically, not an automated process
* Statistical software used
* Analytical techniques involved
* Type of statistic(s) of interest, etc.
Analysis
* Follow-up data analysis may be simple or complicated
* Not different from many other data analysis situations
Presentation and Drawing Conclusions
Presentation * Representativeness & Exceptions
* Graphic Presentations
* Typical: table after table of results – No one has the time to read the tables!
Drawing Conclusions
* Validity & generalizability depend on the adequacy & appropriateness of
simulation design
* Conclusions must be limited by the design conditions and levels.