Top Banner
Using R With Multivariate Statistics Randall E. Schumacker University of Alabama DSAGE Los Angeles | London | New Delhi Singapore | Washington DC
19

Using R With - Pelagicos · Thebook Using R With Multivariate Statistics was written to supple ment existing full textbooks on the various multivariate statistical methods. The multivariate

Jul 17, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Using R With - Pelagicos · Thebook Using R With Multivariate Statistics was written to supple ment existing full textbooks on the various multivariate statistical methods. The multivariate

Using R WithMultivariate Statistics

Randall E. Schumacker

University ofAlabama

DSAGELos Angeles | London | New Delhi

Singapore | Washington DC

Page 2: Using R With - Pelagicos · Thebook Using R With Multivariate Statistics was written to supple ment existing full textbooks on the various multivariate statistical methods. The multivariate

®SAGELos Angeles | London | New DelhiSingapore | Washington DC

FOR INFORMATION:

SAGE Publications, Inc.

2455 Teller Road

Thousand Oaks, California 91320

E-mail: [email protected]

SAGE Publications Ltd.

1 Oliver's Yard

55 City Road

London EC1Y1SP

United Kingdom

SAGE Publications India Pvt. Ltd.

B1/11 Mohan Cooperative Industrial Area

Mathura Road, New Delhi 110 044

India

SAGE Publications Asia-Pacific Re. Ltd.

3 Church Street

#10-04 Samsung Hub

Singapore049483

Acquisitions Editor VickiKnight

Editorial Assistant: Yvonne McDuffee

eLearning Editor: Katie Bierach

Production Editor Kelly DeRosa

Copy Editor QuADS Prepress (P) Ltd.

Typesetter C&M Digitals (P) Ltd.

Proofreader: Jennifer Grubba

Indexer Michael Ferreira

Cover Designer: Michelle Kenny

Marketing Manager: Nicole Elliott

Copyright © 2016 by SAGE Publications, Inc.

All rights reserved. No part of this book may be reproduced or utilizedin any form or by any means, electronic or mechanical, includingphotocopying, recording, or by any information storage and retrievalsystem, without permission in writing from the publisher.

All trademarks depicted within this book, including trademarks appearingas part of a screenshot, figure, or other image are included solely for thepurpose of illustration and are the property of their respective holders.The use of the trademarks in no way indicates any relationship with, orendorsement by, the holders of said trademarks. SPSS is a registeredtrademark of International Business Machines Corporation.

Printed in the United States of America

Library of Congress Cataloging-in-Publication Data

Schumacker, Randall E.Using R with multivariate statistics : a primer / Randall E. Schumacker,University of Alabama, Tuscaloosa.

pages cm

Includes bibliographical references and index.

ISBN 978-1-4833-7796-4 (pbk. : alk. paper)

1. Multivariate analysis—Data processing. 2. R (Computer programlanguage) 3- Statistics—Data processing. I. Title.QA278.S37 2016519.5'3502855133—dc23 2015011814

This book is printed on acid-free paper.

* CertifiedSourcing

www.sfiprograin.orgSFI-00453

SFI

15 16 17 18 19 10 9 8 7 6 5 4 3 2 1

Page 3: Using R With - Pelagicos · Thebook Using R With Multivariate Statistics was written to supple ment existing full textbooks on the various multivariate statistical methods. The multivariate

Brief Contents

Preface xiii

Acknowledgments xix

About the Author xxi

1. Introduction and Overview 1

2. Multivariate Statistics: Issues and Assumptions 9

3. Hotelling's T2: A Two-Group Multivariate Analysis 27

4. Multivariate Analysis ofVariance 57

5. Multivariate Analysis of Covariance 81

6. Multivariate Repeated Measures 99

7. Discriminant Analysis 131

8. Canonical Correlation 147

9. Exploratory Factor Analysis 171

10. Principal Components Analysis 207

11. Multidimensional Scaling 229

12. Structural Equation Modeling 255

Statistical Tables 315

Chapter Answers 325

R Installation and Usage 355

R Packages, Functions, Data Sets, and Script Files 367

Index 375

Page 4: Using R With - Pelagicos · Thebook Using R With Multivariate Statistics was written to supple ment existing full textbooks on the various multivariate statistical methods. The multivariate

®SAGE IsoSAGE was founded in 1965 by Sara Miller McCune tosupport the dissemination of usable knowledge by publishing

innovative and high-quality research and teaching content.

Today, we publish more than 850 journals, including those

of more than 300 learned societies, more than 800 new

books per year, and a growing range of library products

including archives, data, case studies, reports, conference

highlights, and video. SAGE remains majority-owned by our

founder, and after Sara's lifetime will become owned by a

charitable trust that secures our continued independence.

Los Angeles | London | New Delhi | Singapore | Washington DC

Page 5: Using R With - Pelagicos · Thebook Using R With Multivariate Statistics was written to supple ment existing full textbooks on the various multivariate statistical methods. The multivariate

Detailed Contents

Preface xiii

Acknowledgments xix

About the Author xxi

1. Introduction and Overview 1

Background 1Persons of Interest 1

Factors Affecting Statistics 2R Software 5Web Resources 7

References 7

2. Multivariate Statistics: Issues and Assumptions 9

Issues 11

Assumptions 12Normality 12Determinant ofa Matrix 16Equality of Variance-Covariance Matrix 18Box M Test 21

SPSS Check 23Summary 23Web Resources 24

References 24

3. Hotelling's T2: A Two-Group Multivariate Analysis 27

Overview 28

Assumptions 29Univariate Versus Multivariate Hypothesis 30

Statistical Significance 32

Page 6: Using R With - Pelagicos · Thebook Using R With Multivariate Statistics was written to supple ment existing full textbooks on the various multivariate statistical methods. The multivariate

Practical Examples Using R 33Single Sample 33TwoIndependent GroupMean Difference 36Two Groups (Paired)

Dependent Variable Mean Difference 42Power and Effect Size 49

A Priori Power Estimation 50

Effect Size Measures 52Reporting and Interpreting 54Summary 54Exercises 55

Web Resources 55

References 55

4. Multivariate Analysis ofVariance 57

MANOVA Assumptions 58Independent Observations 59Normality 62Equal Variance-Covariance Matrices 63Summary 66

MANOVA Example: One-Way Design 66MANOVA Example: Factorial Design 70Effect Size 76

Reporting and Interpreting 78Summary 79Exercises 79

Web Resources 80

References 80

5. Multivariate Analysis of Covariance 81

Assumptions 82Multivariate Analysis of Covariance 84

MANCOVA Example 85Dependent Variable: Adjusted Means 87

Reporting and Interpreting 93Propensity Score Matching 94Summary 97Web Resources 97

References 98

Page 7: Using R With - Pelagicos · Thebook Using R With Multivariate Statistics was written to supple ment existing full textbooks on the various multivariate statistical methods. The multivariate

6. Multivariate Repeated Measures 99

Assumptions 101

Advantages of Repeated Measure Design 102

Multivariate Repeated Measure Examples 103

Single Dependent Variable 103

Several Dependent Variables: Profile Analysis 108

Doubly Multivariate Repeated Measures 114

Reporting and Interpreting Results 126

Summary 127

Exercises 128

Web Resources 128

References 128

iscriminant Analysis 131

Overview 133

Assumptions 133

Dichotomous Dependent Variable 134

Box MTest 135

Classification Summary 136Chi-Square Test 137

Polytomous Dependent Variable 138

Box M Test 139

Classification Summary 141

Chi-Square Test 142

Effect Size 142

Reporting and Interpreting 143Summary 144

Exercises 144

Web Resources 145

References 145

inonical Correlation 147

Overview 148

Assumptions 149R Packages 150

CCA Package 152

yacca Package 158

Canonical Correlation Example 158

Effect Size 165

Page 8: Using R With - Pelagicos · Thebook Using R With Multivariate Statistics was written to supple ment existing full textbooks on the various multivariate statistical methods. The multivariate

Reporting and Interpreting 165Summary 166Exercises 167

Web Resources 168

References 168

9. Exploratory Factor Analysis 171

Overview 172

Types of Factor Analysis 173Assumptions 173Factor Analysis Versus Principal Components Analysis 176EFA Example 178

R Packages 178Data Set Input 179Sample Size Adequacy 180Number ofFactors and Factor Loadings 183Factor Rotation and Extraction:

Orthogonal Versus Oblique Factors 190Factor Scores 195

Graphical Display 201Reporting and Interpreting 201Summary 202Exercises 203

Web Resources 203

References 204

Appendix: Attitudes Toward Educational Research Scale 205

10. Principal Components Analysis 207

Overview 208

Assumptions 209Bartlett Test (Sphericity) 209KMO Test (Sampling Adequacy) 210Determinant ofCorrelation Matrix 210

Basics of Principal Components Analysis 211Principal Component Scores 215

Principal Component Example 216R Packages 216Data Set 216

Assumptions 219Number ofComponents 220

Page 9: Using R With - Pelagicos · Thebook Using R With Multivariate Statistics was written to supple ment existing full textbooks on the various multivariate statistical methods. The multivariate

Reporting and Interpreting 226

Summary 227

Exercises 228

Web Resources 228

References 228

Multidimensional Scaling 229

Overview 231

Assumptions 232

Proximity Matrix 233

MDS Model 233

MDS Analysis 233

Sample Size 233

Variable Scaling 234

Number ofDimensions 234

R Packages 234

Goodness-of-Fit Index 236

MDS Metric Example 237

MDS Nonmetric Example 244

Reporting and Interpreting Results 251

Summary 252

Exercises 252

Web Resources 253

References 253

structural Equation Modeling 255

Overview 257

Assumptions 258

Multivariate Normality 258

Positive Definite Matrix 261

Equal Variance-Covariance Matrices 263Correlation Versus Covariance Matrix 264

Basic Correlation and Covariance Functions 265

Matrix Input Functions 267

Reference Scaling in SEMModels 270

R Packages 271

Finding R Packages and Functions 271

SEMPackages 273

CFA Models 275

Basic Model 275

Page 10: Using R With - Pelagicos · Thebook Using R With Multivariate Statistics was written to supple ment existing full textbooks on the various multivariate statistical methods. The multivariate

Multiple Group Model 282Structural Equation Models 290

Basic SEM Model 290

Longitudinal SEMModels 295Reporting and Interpreting Results 310Summary 311Exercises 312

Web Resources 312

References 312

Statistical Tables 315

Table 1: Areas Under the Normal Curve (z Scores) 316

Table 2: Distribution of t for Given Probability Levels 317Table 3: Distribution of r for Given Probability Levels 318Table 4: Distribution of Chi-Square

for Given Probability Levels 319Table 5: The F Distribution for Given Probability

Levels (.05 Level) 321

Table 6: The Distribution of F for Given ProbabilityLevels (.01 Level) 322

Table 7: Distribution of Hartley F for Given ProbabilityLevels 323

Chapter Answers 325

R Installation and Usage 355

R Packages, Functions, Data Sets, and Script Files 367

Index 375

Page 11: Using R With - Pelagicos · Thebook Using R With Multivariate Statistics was written to supple ment existing full textbooks on the various multivariate statistical methods. The multivariate

Preface

The book Using R With Multivariate Statistics was written to supplement existing full textbooks on the various multivariate statisticalmethods. The multivariate statistics books provide a more in-depth

coverage of the methods presented in this book, but without the use of Rsoftware. The R code is provided for some of the data set examples in themultivariate statistics books listed below. It is hoped that students can runthe examples in R and compare results in the books that used SAS, IBM®SPSS® Statistics*, or STATA statistics packages. The advantage of R is that itisfree and runs on Windows, Mac, and LINUX operating systems.

The full textbooks also providea more in-depth discussion of the assumptions and issues, as well as provide data analysis and interpretation of theresults using SPSS, SAS, and/or STATA. The several multivariate statistics booksI consulted and referenced are as follows:

• Afifi, A., Clark, V, & May, S. (2004). Computer-aided multivariateanalysis (4th ed.). Boca Raton, FL: Chapman & Hall/CRC Press.

• Hair, J. F, Jr., Black, W. C, Babin, B. J., & Anderson, R. E. (2010).Multivariate data analysis (7th ed.). Upper Saddle River, NJ: PrenticeHall.

• Meyers, L. S., Gamst, G., & Guarino, A. J. (2013). Applied multivariate research: Design and interpretation (2nd ed.). Thousand Oaks,CA: Sage.

• Raykov, T, & Marcoulides, G. A. (2008). An introduction to appliedmultivariate analysis. New York, NY: Routledge (Taylor & FrancisGroup).

• Stevens, S. S. (2009). Applied multivariate statistics for the socialsciences(5th ed.). New York, NY: Routledge (Taylor & Francis Group).

• Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics(5th ed.). Boston, MA: Allyn & Bacon.

'SPSS is a registered trademark of International Business Machines Corporation.

>- xiii

Page 12: Using R With - Pelagicos · Thebook Using R With Multivariate Statistics was written to supple ment existing full textbooks on the various multivariate statistical methods. The multivariate

xiv < USING R WITH MULTIVARIATE STATISTICS

This book was written to provide researchers with access to the freeR software when conducting multivariate statistical analysis. There aremany packages and functions available, which can be overwhelming, so Ihave collected some of the widely used packages and functions for themultivariate methods in the book. Many of the popular multivariate statistics books will provide a more complete treatment of the topics coveredin this book along with SAS and/or SPSS solutions. I am hopeful that thisbook will provide a good supplemental coverage of topics in multivariatebooks and permit faculty and students to run R software analyses. The Rsoftware permits the end users to customize programs to provide the typeof analysis and output they desire. The R commands can be saved in ascript file for future use, can be readily shared, and can provide the usercontrol over the analytic steps and algorithms used. The advantages ofusing R software are many, including the following:

• Free software

• The ability to customize statistical analysis• Control over analytic steps and algorithms used• Available on Window, Mac, and Linux operating systems• Multitude of packages and functions to conduct analytics• Documentation and reference guides available

A Data Sets

The multivariate textbooks listed above have numerous examples and datasets available either in their book or on the publishers* website. There arealso numerous data sets available for statistical analysis in R, which can beviewed by using the following R command(s):

> data() # alphabetical list of data sets

or

> data(package=.packages(all.available=TRUE)) data sets listed in

various R packages

or

> library(help = "datasets") alphabetical list of data in the

R dataset package

Page 13: Using R With - Pelagicos · Thebook Using R With Multivariate Statistics was written to supple ment existing full textbooks on the various multivariate statistical methods. The multivariate

Preface • xv

or, you can also enter the following URL to obtain a list:

http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/OOIndex.html

The type of data set we would generally want is one that contained aset of continuous dependent variables and a set of continuous independentvariables. The correlation of the two linear sets of variables is the basis for

conducting many of the multivariate statistics covered in the book.The input and use of the data sets are generally provided with a brief

explanation and example in R code. Overall, the use of the data sets canbe enhanced by taking the time to study an R tutorial located at

http://ww2.coastal.edu/kingw/statistics/R-tutorials/dataframes.html

The following R commands are helpful in understanding the data set,where the data set name is specified for each function: in this example, iris.

>

>

help(iris)

describe(iris)

> iris

> head(iris, n = 10)

> tail(iris, n = 10)

A Input Data Files

# information on iris data set

# descriptive statistics on

variables in iris data set

# list the iris data

# print first ten record lines

in iris data set

# print last ten record lines

in iris data set

There are many ways to input data files, depending on how the data arecoded (Schumacker, 2014). You may wish to use Notepad to initiallyview a data file. Commercial software packages have their own format(SPSS: *.sav; SAS: *.sas; EXCEL: *.xls; etc.). A data file may be formattedwith commas between the data values, semicolons, a tab, or a space.Each file type requires specifying the separation type between data valuesusing the sep() argument in one of the following R functions that readsthe data file:

read.csv # read a comma or semi-colon data separated fileread.delim # read a tab delimited file

read.table # read a space data separated file

Page 14: Using R With - Pelagicos · Thebook Using R With Multivariate Statistics was written to supple ment existing full textbooks on the various multivariate statistical methods. The multivariate

xvi -4 USING R WITH MULTIVARIATE STATISTICS

The separation types in the sep() argument are as follows:

# comma separated file

# semi-colon separated file

# tab deliminated file

# space separated file

You can find out more about reading in data files with different separationtypes using >?read.table.

A useful approach for finding and reading data sets on your computeris to embed the file.choose() function. This opens a dialog window andpermits a search of your folders for the data set. Click on the data set, andit is read into the file. The R command would be as follows:

> mydata = read.table(file.choose(), header = TRUE, sep =" ")

This command would find a data file with variable names on the first

line (header = TRUE) and a space between the data values.Many statistical methods use a correlation or covariance matrix. Some

use a partial correlation or partial covariance matrix. The correlation andcovariance matrices are computed by using the following commands,respectively:

> cor(mydata)

> cov(mydata)

The corpcor package has two useful functions that permit conversionin either direction from correlation to partial correlation; or partial correlation to correlation. This also applies to covariance matrices; in this examplethe matrix is mymatrix.

> cor2pcor(mymatrix) # compute partial correlation matrix fromcorrelation/covariance matrix

> pcor2cor(mymatrix) # converts partial correlation or covariancematrix to a correlation matrix

A chi-square test of whether two correlation matrices are equal is conducted using the following R commands.

> library(psych)

> cortest(Rl,R2,nl,n2,cor=FALSE) # input 2 correlation

matrices and sample sizes

Page 15: Using R With - Pelagicos · Thebook Using R With Multivariate Statistics was written to supple ment existing full textbooks on the various multivariate statistical methods. The multivariate

Preface p- xvii

Also, this function permits testing whether a single correlation matrixis an identity matrix.

> cortest(Rl,R2=NULL,nl,n2=NULL,cor=FALSE) # input 1 correlation

matrix and sample size.

You will find these functions very useful when running multivariatestatistical analyses.

A R Packages

The multivariate statistical analyses require the use of certain R packages.In the appendix, for each chapter, I have compiled a list of the R packages,functions, data sets, and Rscript files I used to conduct the analyses. Thisshould provide a handy reference guide. You can also obtain a list ofpackages by

> ??packages

Information about a specific R package can be obtained by

> help(package="psych")

I recommend using the options in the pull-down menu whenever possible. The options include installing, loading, and updating packages. Youcan also issue individual commands for these options:

> install.packages()

> update.packages()

You may receive a notice that a particular package runs under a certain version of R. When this occurs, simply uninstall your current versionof R in the Control Panel, and then install the newer version of R from the

website (http://www.r-project.org/).

Page 16: Using R With - Pelagicos · Thebook Using R With Multivariate Statistics was written to supple ment existing full textbooks on the various multivariate statistical methods. The multivariate

xviii 4 USING R WITH MULTIVARIATE STATISTICS

File Un View Mac IPactagts) Windows Help

|ia?|^|y| |RjJe|l lMd P«k»9«-

d 9.1 wwc cSet CRAN minor...

Select repositories...

Instill package).*)...

Update packages...

Install packages) from local :tp files..

There are two very important additions to the R software package.After installing R, either of these can make your use of R much easier,especially in organizing files and packages. The two software products areRCommander and RStudio. You will need to decide which one fits yourneeds. These are considered graphical user interfaces, which means theycome with pull-down menus and dialog windows displaying various typesof information. They can be downloaded from the following websites:

> http://www.rcommander.com/

> http://www.rstudio.com/

Page 17: Using R With - Pelagicos · Thebook Using R With Multivariate Statistics was written to supple ment existing full textbooks on the various multivariate statistical methods. The multivariate

Acknowledgments

The photographs of eminent statisticians who influenced the fieldof multivariate statistics were given by living individuals and/orcommon sources on the Internet. The biographies were a compi

lation of excerpts from common source Internet materials, comments invarious textbooks, flyers, and conference pamphlets. I would like to suggest sources for additional information about eminent statisticians that maybe of interest to scholars and academicians. First, Wikipedia (http://www.wikipedia.org/), which provides contributed information on individuals inmany different languages around the globe, and their list of many foundersof statistics (http://en.wikipedia.org/wiki/Founders_of_statistics). TheAmerican Statistical Association (www.amstat.org) supports a websitewith biographies and links to many other statistical societies. The World ofStatistics (www.worldofstatistics.org) provides a website with famous statisticians' biographies and/or links to reference sources. A list of famousstatisticians can be found on Wikipedia (http://en.wikipedia.org/wiki/List_of_statisticians). Simply Google and you will find websites aboutfamous statisticians. Any errors or omissions in the biographies are unintentional, and in the purview of my responsibilities, not the publisher's.

SAGE Publications would like to thank the following reviewers:

Xiaofen Keating, The University of Texas at Austin

Richard Feinn, Southern Connecticut State University

James Alan Fox, Northeastern University

Thomas H. Short, John Carroll University

Jianmin Guan, University of Texas at San Antonio

Edward D. Gailey, Fairmont State University

Prathiba Natesan, University of North Texas

David E. Drew, Claremont Graduate University

Page 18: Using R With - Pelagicos · Thebook Using R With Multivariate Statistics was written to supple ment existing full textbooks on the various multivariate statistical methods. The multivariate

xx M USING R WITH MULTIVARIATE STATISTICS

Camille L. Bryant, Columbus State University

Darrell Rudmann, Shawnee State University

Jann W. Maclnnes, University of Florida

Tamara A. Hamai, California State University, Dominguez Hills

Weihua Fan, University of Houston

Page 19: Using R With - Pelagicos · Thebook Using R With Multivariate Statistics was written to supple ment existing full textbooks on the various multivariate statistical methods. The multivariate

About the Author

Randall E. Schumacker is Professor of Educational Research at The

University of Alabama. He has written and coedited several books, including A Beginner's Guide to Structural Equation Modeling (4th ed.),AdvancedStructuralEquation Modeling:Issuesand Techniques, InteractionandNon-LinearEffects in StructuralEquation Modeling, NewDevelopmentsand Techniques in StructuralEquationModeling, Understanding StatisticalConcepts Using S-PLUS, Understanding Statistics Using R, and LearningStatistics Using R.

He was the founder and is now Emeritus Editor of Structural EquationModeling: A Multidisciplinary fournal, and he established the StructuralEquation Modeling Special Interest Group within the American EducationalResearch Association. He is also the Emeritus Editor of Multiple LinearRegression Viewpoints, the oldest journal sponsored by the AmericanEducational Research Association (Multiple Linear Regression: GeneralLinear Model Special Interest Group).

He has conducted international and national workshops, has servedon the editorial board of several journals, and currently pursues hisresearch interests in measurement, statistics, and structural equation modeling. He was the 1996 recipient of the Outstanding Scholar Award and the1998 recipient of the Charn Oswachoke International Award. In 2010, helaunched the DecisionKit App for the iPhone, iPad, and iTouch, which canassist researchers in making decisions about which measurement, researchdesign, or statistic to use in their research projects. In 2011, he receivedthe Apple iPad Award, and in 2012, he received the CITFaculty TechnologyAward at the University of Alabama. In 2013, he received the McCroryFaculty Excellence in Research Award from the College of Education at theUniversity of Alabama. In 2014, he was the recipient of the StructuralEquation Modeling Service Award at the American Educational ResearchAssociation.

^ xxi