Top Banner
An introduction to the psych package: Part I: data entry and data description William Revelle Department of Psychology Northwestern University April 23, 2017 Contents 0.1 Jump starting the psych package–a guide for the impatient ......... 3 0.2 Psychometric functions are summarized in the second vignette ....... 4 1 Overview of this and related documents 6 2 Getting started 7 3 Basic data analysis 8 3.1 Getting the data by using read.file ....................... 8 3.2 Data input from the clipboard ......................... 9 3.3 Basic descriptive statistics ............................ 10 3.3.1 Outlier detection using outlier .................... 11 3.3.2 Basic data cleaning using scrub .................... 11 3.3.3 Recoding categorical variables into dummy coded variables ..... 13 3.4 Simple descriptive graphics ........................... 13 3.4.1 Scatter Plot Matrices .......................... 14 3.4.2 Density or violin plots .......................... 14 3.4.3 Means and error bars .......................... 19 3.4.4 Error bars for tabular data ....................... 19 3.4.5 Two dimensional displays of means and errors ............. 23 3.4.6 Back to back histograms ......................... 25 3.4.7 Correlational structure .......................... 27 3.4.8 Heatmap displays of correlational structure .............. 28 3.5 Testing correlations ................................ 28 1
62

An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

Jul 08, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

An introduction to the psych package Part I

data entry and data description

William RevelleDepartment of PsychologyNorthwestern University

April 23 2017

Contents

01 Jump starting the psych packagendasha guide for the impatient 302 Psychometric functions are summarized in the second vignette 4

1 Overview of this and related documents 6

2 Getting started 7

3 Basic data analysis 831 Getting the data by using readfile 832 Data input from the clipboard 933 Basic descriptive statistics 10

331 Outlier detection using outlier 11332 Basic data cleaning using scrub 11333 Recoding categorical variables into dummy coded variables 13

34 Simple descriptive graphics 13341 Scatter Plot Matrices 14342 Density or violin plots 14343 Means and error bars 19344 Error bars for tabular data 19345 Two dimensional displays of means and errors 23346 Back to back histograms 25347 Correlational structure 27348 Heatmap displays of correlational structure 28

35 Testing correlations 28

1

36 Polychoric tetrachoric polyserial and biserial correlations 34

4 Multilevel modeling 3441 Decomposing data into within and between level correlations using statsBy 3742 Generating and displaying multilevel data 3743 Factor analysis by groups 38

5 Multiple Regression mediation moderation and set correlations 3851 Multiple regression from data or correlation matrices 3852 Mediation and Moderation analysis 4053 Set Correlation 44

6 Converting output to APA style tables using LATEX 47

7 Miscellaneous functions 49

8 Data sets 50

9 Development version and a users guide 51

10 Psychometric Theory 52

11 SessionInfo 52

2

01 Jump starting the psych packagendasha guide for the impatient

You have installed psych (section 2) and you want to use it without reading much moreWhat should you do

1 Activate the psych package

library(psych)

2 Input your data (section 31) There are two ways to do this

bull Find and read standard files using readfile This will open a search windowfor your operating system which you can use to find the file If the file has asuffix of text txt csv data sav r R rds Rds rda Rda rdata orRData then the file will be opened and the data will be read in

myData lt- readfile() find the appropriate file using your normal operating system

bull Alternatively go to your friendly text editor or data manipulation program(eg Excel) and copy the data to the clipboard Include a first line that has thevariable labels Paste it into psych using the readclipboardtab command

myData lt- readclipboardtab() if on the clipboard

Note that there are number of options for readclipboard for reading in Excelbased files lower triangular files etc

3 Make sure that what you just read is right Describe it (section 33) and perhapslook at the first and last few lines If you have multiple groups try describeBy

dim(myData) What are the dimensions of the data

describe(myData) or

descrbeBy(myDatagroups=mygroups) for descriptive statistics by groups

headTail(myData) show the first and last n lines of a file

4 Look at the patterns in the data If you have fewer than about 12 variables lookat the SPLOM (Scatter Plot Matrix) of the data using pairspanels (section 341)Then use the outlier function to detect outliers

pairspanels(myData)

outlier(myData)

5 Note that you might have some weird subjects probably due to data entry errorsEither edit the data by hand (use the edit command) or just scrub the data (section332)

cleaned lt- scrub(myData max=9) eg change anything great than 9 to NA

6 Graph the data with error bars for each variable (section 343)

errorbars(myData)

3

7 Find the correlations of all of your data lowerCor will by default find the pairwisecorrelations round them to 2 decimals and display the lower off diagonal matrix

bull Descriptively (just the values) (section 347)

r lt- lowerCor(myData) The correlation matrix rounded to 2 decimals

bull Graphically (section 348) Another way is to show a heat map of the correla-tions with the correlation values included

corPlot(r) examine the many options for this function

bull Inferentially (the values the ns and the p values) (section 35)

corrtest(myData)

8 Apply various regression models

Several functions are meant to do multiple regressions either from the raw data orfrom a variancecovariance matrix or a correlation matrix

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

myData lt- satact

colnames(myData) lt- c(mod1med1x1x2y1y2)

setCor(y = c( y1 y2) x = c(x1x2) data = myData)

bull mediate will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables mediatedthrough a mediation variable It then tests the mediation effect using a bootstrap

mediate(y = c( y1 y2) x = c(x1x2) m= med1 data = myData)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple xvariables mediated through a mediation variable It then tests the mediationeffect using a boot strap

mediate(y = c( y1 y2) x = c(x1x2) m= med1 mod = mod1 data = myData)

02 Psychometric functions are summarized in the second vignette

Many additional functions particularly designed for basic and advanced psychomet-rics are discussed more fully in the Overview Vignette A brief review of the functionsavailable is included here In addition there are helpful tutorials for Finding omegaHow to score scales and find reliability and for Using psych for factor analysis athttppersonality-projectorgr

4

bull Test for the number of factors in your data using parallel analysis (faparallelsection ) or Very Simple Structure (vss )

faparallel(myData)

vss(myData)

bull Factor analyze (see section ) the data with a specified number of factors(the default is 1) the default method is minimum residual the default rotationfor more than one factor is oblimin There are many more possibilities (seesections -) Compare the solution to a hierarchical cluster analysis using theICLUST algorithm (Revelle 1979) (see section ) Also consider a hierarchicalfactor solution to find coefficient ω (see )

fa(myData)

iclust(myData)

omega(myData)

If you prefer to do a principal components analysis you may use the principalfunction The default is one component

principal(myData)

bull Some people like to find coefficient α as an estimate of reliability This may bedone for a single scale using the alpha function (see ) Perhaps more usefulis the ability to create several scales as unweighted averages of specified itemsusing the scoreItems function (see ) and to find various estimates of internalconsistency for these scales find their intercorrelations and find scores for allthe subjects

alpha(myData) score all of the items as part of one scale

myKeys lt- makekeys(nvar=20list(first = c(1-35-7810)second=c(24-61115-16)))

myscores lt- scoreItems(myKeysmyData) form several scales

myscores show the highlights of the results

At this point you have had a chance to see the highlights of the psych package and to dosome basic (and advanced) data analysis You might find reading this entire vignette aswell as the Overview Vignette to be helpful to get a broader understanding of what can bedone in R using the psych Remember that the help command () is available for everyfunction Try running the examples for each help page

5

1 Overview of this and related documents

The psych package (Revelle 2015) has been developed at Northwestern University since2005 to include functions most useful for personality psychometric and psychological re-search The package is also meant to supplement a text on psychometric theory (Revelleprep) a draft of which is available at httppersonality-projectorgrbook

Some of the functions (eg readfile readclipboard describe pairspanels scat-terhist errorbars multihist bibars) are useful for basic data entry and descrip-tive analyses

Psychometric applications emphasize techniques for dimension reduction including factoranalysis cluster analysis and principal components analysis The fa function includesfive methods of factor analysis (minimum residual principal axis weighted least squaresgeneralized least squares and maximum likelihood factor analysis) Principal ComponentsAnalysis (PCA) is also available through the use of the principal or pca functions De-termining the number of factors or components to extract may be done by using the VerySimple Structure (Revelle and Rocklin 1979) (vss) Minimum Average Partial correlation(Velicer 1976) (MAP) or parallel analysis (faparallel) criteria These and several othercriteria are included in the nfactors function Two parameter Item Response Theory(IRT) models for dichotomous or polytomous items may be found by factoring tetra-

choric or polychoric correlation matrices and expressing the resulting parameters interms of location and discrimination using irtfa

Bifactor and hierarchical factor structures may be estimated by using Schmid Leimantransformations (Schmid and Leiman 1957) (schmid) to transform a hierarchical factorstructure into a bifactor solution (Holzinger and Swineford 1937) Higher order modelscan also be found using famulti

Scale construction can be done using the Item Cluster Analysis (Revelle 1979) (iclust)function to determine the structure and to calculate reliability coefficients α (Cronbach1951)(alpha scoreItems scoremultiplechoice) β (Revelle 1979 Revelle and Zin-barg 2009) (iclust) and McDonaldrsquos ωh and ωt (McDonald 1999) (omega) Guttmanrsquos sixestimates of internal consistency reliability (Guttman (1945) as well as additional estimates(Revelle and Zinbarg 2009) are in the guttman function The six measures of Intraclasscorrelation coefficients (ICC) discussed by Shrout and Fleiss (1979) are also available

For data with a a multilevel structure (eg items within subjects across time or itemswithin subjects across groups) the describeBy statsBy functions will give basic descrip-tives by group StatsBy also will find within group (or subject) correlations as well as thebetween group correlation

multilevelreliability mlr will find various generalizability statistics for subjects over

6

time and items mlPlot will graph items over for each subject mlArrange converts widedata frames to long data frames suitable for multilevel modeling

Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairspanels cor-relation ldquoheat mapsrdquo (corPlot) factor cluster and structural diagrams using fadiagramiclustdiagram structurediagram and hetdiagram as well as item response charac-teristics and item and test information characteristic curves plotirt and plotpoly

This vignette is meant to give an overview of the psych package That is it is meantto give a summary of the main functions in the psych package with examples of howthey are used for data description dimension reduction and scale construction The ex-tended user manual at psych_manualpdf includes examples of graphic output and moreextensive demonstrations than are found in the help menus (Also available at http

personality-projectorgrpsych_manualpdf) The vignette psych for sem atpsych_for_sempdf discusses how to use psych as a front end to the sem package of JohnFox (Fox et al 2012) (The vignette is also available at httppersonality-project

orgrbookpsych_for_sempdf)

For a step by step tutorial in the use of the psych package and the base functions inR for basic personality research see the guide for using R for personality research athttppersonalitytheoryorgrrshorthtml For an introduction to psychometrictheory with applications in R see the draft chapters at httppersonality-project

orgrbook)

2 Getting started

Some of the functions described in the Overview Vignette require other packages This isnot the case for the functions listed in this Introduction Particularly useful for rotatingthe results of factor analyses (from eg fa factorminres factorpa factorwlsor principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary

installpackages(ctv)

library(ctv)

taskviews(Psychometrics)

The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages

7

installpackages(list(c(GPArotationmnormt)

Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language

3 Basic data analysis

A number of psych functions facilitate the entry of data and finding basic descriptivestatistics

Remember to run any of the psych functions it is necessary to make the package activeby using the library command

library(psych)

The other packages once installed will be called automatically by psych

It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg

First lt- function(x) library(psych)

31 Getting the data by using readfile

Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be doneusing the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of txttext r rds rda csv xpt or sav the file will be read correctly

mydata lt- readfile()

If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command

mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row and 36 fields the first of which is 4 colums the rest of which are 1 column each

If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (comma

8

separated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)

To read SPSS files and to keep the value labels specify usevaluelabels=TRUE

myspss lt- readfile(usevaluelabels=TRUE) this will keep the value labels for sav files

32 Data input from the clipboard

There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly

readclipboard is the base function for reading data from the clipboard

readclipboardcsv for reading text that is comma delimited

readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)

readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix

readclipboardupper for reading input of an upper triangular matrix

readclipboardfwf for reading in fixed width fields (some very old data sets)

For example given a data set copied to the clipboard from a spreadsheet just enter thecommand

mydata lt- readclipboard()

This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function

gt mydata lt- readclipboard(sep=t) define the tab option or

gt mytabdata lt- readclipboardtab() just use the alternative function

For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below the

9

first variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)

gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))

33 Basic descriptive statistics

Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures

describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an

describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)

gt library(psych)

gt data(satact)

gt describe(satact) basic descriptive statistics

vars n mean sd median trimmed mad min max range skew

gender 1 700 165 048 2 168 000 1 2 1 -061

education 2 700 316 143 3 331 148 0 5 5 -068

age 3 700 2559 950 22 2386 593 13 65 52 164

ACT 4 700 2855 482 29 2884 445 3 36 33 -066

SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064

SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059

kurtosis se

gender -162 002

education -007 005

age 242 036

ACT 053 018

SATV 033 427

SATQ -002 441

These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 6) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

gt basic descriptive statistics by a grouping variable

gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

Descriptive statistics by group

group 1

vars n mean sd se

gender 1 247 100 000 000

10

education 2 247 300 154 010

age 3 247 2586 974 062

ACT 4 247 2879 506 032

SATV 5 247 61511 11416 726

SATQ 6 245 63587 11602 741

------------------------------------------------------------

group 2

vars n mean sd se

gender 1 453 200 000 000

education 2 453 326 135 006

age 3 453 2545 937 044

ACT 4 453 2842 469 022

SATV 5 453 61066 11231 528

SATQ 6 442 59600 11307 538

The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

gt samat lt- describeBy(satactlist(satact$gendersatact$education)

+ skew=FALSEranges=FALSEmat=TRUE)

gt headTail(samat)

item group1 group2 vars n mean sd se

gender1 1 1 0 1 27 1 0 0

gender2 2 2 0 1 30 2 0 0

gender3 3 1 1 1 20 1 0 0

gender4 4 2 1 1 25 2 0 0

ltNAgt ltNAgt ltNAgt

SATQ9 69 1 4 6 51 6359 10412 1458

SATQ10 70 2 4 6 86 59759 10624 1146

SATQ11 71 1 5 6 46 65783 8961 1321

SATQ12 72 2 5 6 93 60672 10555 1095

331 Outlier detection using outlier

One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

332 Basic data cleaning using scrub

If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns

11

gt png( outlierpng )

gt d2 lt- outlier(satactcex=8)

gt devoff()

null device

1

Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

12

3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

gt x lt- matrix(1120ncol=10byrow=TRUE)

gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

gt newx

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

[1] 1 2 NA NA NA 6 7 8 9 10

[2] 11 12 NA NA NA 16 17 18 19 20

[3] 21 22 NA NA NA 26 27 28 29 30

[4] 31 32 33 NA NA 36 37 38 39 40

[5] 41 42 43 44 NA 46 47 48 49 50

[6] 51 52 53 54 55 56 57 58 59 60

[7] 61 62 63 64 65 66 67 68 69 70

[8] 71 72 NA NA NA 76 77 78 79 80

[9] 81 82 NA NA NA 86 87 88 89 90

[10] 91 92 NA NA NA 96 97 98 99 100

[11] 101 102 NA NA NA 106 107 108 109 110

[12] 111 112 NA NA NA 116 117 118 119 120

Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

333 Recoding categorical variables into dummy coded variables

Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

34 Simple descriptive graphics

Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidence

13

limits (Figure 6) This may be turned off by specifying eyes=FALSE densityBy or vio-

linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 5) (Theseare sometimes called ldquolava-lamprdquo plots)

341 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 5)

14

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

15

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

16

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

+ main =Density distributions of four measures of affect )

gt devoff()

null device

1

Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

17

gt data(satact)

gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

Density Plot by gender for SAT V and Q

Obs

erve

d

SATV M SATV F SATQ M SATQ F

200

300

400

500

600

700

800

Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

18

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

19

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

095 confidence limits

Independent Variable

Dep

ende

nt V

aria

ble

bfagree bfcon bfext bfneur bfopen

050

100

150

Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

20

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

Male Female

095 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

21

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

22

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

23

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

24

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

25

data(bfi)gt png( bibarspng )

gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

gt devoff()

null device

1

Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

26

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

27

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

28

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

29

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

30

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 2: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

36 Polychoric tetrachoric polyserial and biserial correlations 34

4 Multilevel modeling 3441 Decomposing data into within and between level correlations using statsBy 3742 Generating and displaying multilevel data 3743 Factor analysis by groups 38

5 Multiple Regression mediation moderation and set correlations 3851 Multiple regression from data or correlation matrices 3852 Mediation and Moderation analysis 4053 Set Correlation 44

6 Converting output to APA style tables using LATEX 47

7 Miscellaneous functions 49

8 Data sets 50

9 Development version and a users guide 51

10 Psychometric Theory 52

11 SessionInfo 52

2

01 Jump starting the psych packagendasha guide for the impatient

You have installed psych (section 2) and you want to use it without reading much moreWhat should you do

1 Activate the psych package

library(psych)

2 Input your data (section 31) There are two ways to do this

bull Find and read standard files using readfile This will open a search windowfor your operating system which you can use to find the file If the file has asuffix of text txt csv data sav r R rds Rds rda Rda rdata orRData then the file will be opened and the data will be read in

myData lt- readfile() find the appropriate file using your normal operating system

bull Alternatively go to your friendly text editor or data manipulation program(eg Excel) and copy the data to the clipboard Include a first line that has thevariable labels Paste it into psych using the readclipboardtab command

myData lt- readclipboardtab() if on the clipboard

Note that there are number of options for readclipboard for reading in Excelbased files lower triangular files etc

3 Make sure that what you just read is right Describe it (section 33) and perhapslook at the first and last few lines If you have multiple groups try describeBy

dim(myData) What are the dimensions of the data

describe(myData) or

descrbeBy(myDatagroups=mygroups) for descriptive statistics by groups

headTail(myData) show the first and last n lines of a file

4 Look at the patterns in the data If you have fewer than about 12 variables lookat the SPLOM (Scatter Plot Matrix) of the data using pairspanels (section 341)Then use the outlier function to detect outliers

pairspanels(myData)

outlier(myData)

5 Note that you might have some weird subjects probably due to data entry errorsEither edit the data by hand (use the edit command) or just scrub the data (section332)

cleaned lt- scrub(myData max=9) eg change anything great than 9 to NA

6 Graph the data with error bars for each variable (section 343)

errorbars(myData)

3

7 Find the correlations of all of your data lowerCor will by default find the pairwisecorrelations round them to 2 decimals and display the lower off diagonal matrix

bull Descriptively (just the values) (section 347)

r lt- lowerCor(myData) The correlation matrix rounded to 2 decimals

bull Graphically (section 348) Another way is to show a heat map of the correla-tions with the correlation values included

corPlot(r) examine the many options for this function

bull Inferentially (the values the ns and the p values) (section 35)

corrtest(myData)

8 Apply various regression models

Several functions are meant to do multiple regressions either from the raw data orfrom a variancecovariance matrix or a correlation matrix

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

myData lt- satact

colnames(myData) lt- c(mod1med1x1x2y1y2)

setCor(y = c( y1 y2) x = c(x1x2) data = myData)

bull mediate will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables mediatedthrough a mediation variable It then tests the mediation effect using a bootstrap

mediate(y = c( y1 y2) x = c(x1x2) m= med1 data = myData)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple xvariables mediated through a mediation variable It then tests the mediationeffect using a boot strap

mediate(y = c( y1 y2) x = c(x1x2) m= med1 mod = mod1 data = myData)

02 Psychometric functions are summarized in the second vignette

Many additional functions particularly designed for basic and advanced psychomet-rics are discussed more fully in the Overview Vignette A brief review of the functionsavailable is included here In addition there are helpful tutorials for Finding omegaHow to score scales and find reliability and for Using psych for factor analysis athttppersonality-projectorgr

4

bull Test for the number of factors in your data using parallel analysis (faparallelsection ) or Very Simple Structure (vss )

faparallel(myData)

vss(myData)

bull Factor analyze (see section ) the data with a specified number of factors(the default is 1) the default method is minimum residual the default rotationfor more than one factor is oblimin There are many more possibilities (seesections -) Compare the solution to a hierarchical cluster analysis using theICLUST algorithm (Revelle 1979) (see section ) Also consider a hierarchicalfactor solution to find coefficient ω (see )

fa(myData)

iclust(myData)

omega(myData)

If you prefer to do a principal components analysis you may use the principalfunction The default is one component

principal(myData)

bull Some people like to find coefficient α as an estimate of reliability This may bedone for a single scale using the alpha function (see ) Perhaps more usefulis the ability to create several scales as unweighted averages of specified itemsusing the scoreItems function (see ) and to find various estimates of internalconsistency for these scales find their intercorrelations and find scores for allthe subjects

alpha(myData) score all of the items as part of one scale

myKeys lt- makekeys(nvar=20list(first = c(1-35-7810)second=c(24-61115-16)))

myscores lt- scoreItems(myKeysmyData) form several scales

myscores show the highlights of the results

At this point you have had a chance to see the highlights of the psych package and to dosome basic (and advanced) data analysis You might find reading this entire vignette aswell as the Overview Vignette to be helpful to get a broader understanding of what can bedone in R using the psych Remember that the help command () is available for everyfunction Try running the examples for each help page

5

1 Overview of this and related documents

The psych package (Revelle 2015) has been developed at Northwestern University since2005 to include functions most useful for personality psychometric and psychological re-search The package is also meant to supplement a text on psychometric theory (Revelleprep) a draft of which is available at httppersonality-projectorgrbook

Some of the functions (eg readfile readclipboard describe pairspanels scat-terhist errorbars multihist bibars) are useful for basic data entry and descrip-tive analyses

Psychometric applications emphasize techniques for dimension reduction including factoranalysis cluster analysis and principal components analysis The fa function includesfive methods of factor analysis (minimum residual principal axis weighted least squaresgeneralized least squares and maximum likelihood factor analysis) Principal ComponentsAnalysis (PCA) is also available through the use of the principal or pca functions De-termining the number of factors or components to extract may be done by using the VerySimple Structure (Revelle and Rocklin 1979) (vss) Minimum Average Partial correlation(Velicer 1976) (MAP) or parallel analysis (faparallel) criteria These and several othercriteria are included in the nfactors function Two parameter Item Response Theory(IRT) models for dichotomous or polytomous items may be found by factoring tetra-

choric or polychoric correlation matrices and expressing the resulting parameters interms of location and discrimination using irtfa

Bifactor and hierarchical factor structures may be estimated by using Schmid Leimantransformations (Schmid and Leiman 1957) (schmid) to transform a hierarchical factorstructure into a bifactor solution (Holzinger and Swineford 1937) Higher order modelscan also be found using famulti

Scale construction can be done using the Item Cluster Analysis (Revelle 1979) (iclust)function to determine the structure and to calculate reliability coefficients α (Cronbach1951)(alpha scoreItems scoremultiplechoice) β (Revelle 1979 Revelle and Zin-barg 2009) (iclust) and McDonaldrsquos ωh and ωt (McDonald 1999) (omega) Guttmanrsquos sixestimates of internal consistency reliability (Guttman (1945) as well as additional estimates(Revelle and Zinbarg 2009) are in the guttman function The six measures of Intraclasscorrelation coefficients (ICC) discussed by Shrout and Fleiss (1979) are also available

For data with a a multilevel structure (eg items within subjects across time or itemswithin subjects across groups) the describeBy statsBy functions will give basic descrip-tives by group StatsBy also will find within group (or subject) correlations as well as thebetween group correlation

multilevelreliability mlr will find various generalizability statistics for subjects over

6

time and items mlPlot will graph items over for each subject mlArrange converts widedata frames to long data frames suitable for multilevel modeling

Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairspanels cor-relation ldquoheat mapsrdquo (corPlot) factor cluster and structural diagrams using fadiagramiclustdiagram structurediagram and hetdiagram as well as item response charac-teristics and item and test information characteristic curves plotirt and plotpoly

This vignette is meant to give an overview of the psych package That is it is meantto give a summary of the main functions in the psych package with examples of howthey are used for data description dimension reduction and scale construction The ex-tended user manual at psych_manualpdf includes examples of graphic output and moreextensive demonstrations than are found in the help menus (Also available at http

personality-projectorgrpsych_manualpdf) The vignette psych for sem atpsych_for_sempdf discusses how to use psych as a front end to the sem package of JohnFox (Fox et al 2012) (The vignette is also available at httppersonality-project

orgrbookpsych_for_sempdf)

For a step by step tutorial in the use of the psych package and the base functions inR for basic personality research see the guide for using R for personality research athttppersonalitytheoryorgrrshorthtml For an introduction to psychometrictheory with applications in R see the draft chapters at httppersonality-project

orgrbook)

2 Getting started

Some of the functions described in the Overview Vignette require other packages This isnot the case for the functions listed in this Introduction Particularly useful for rotatingthe results of factor analyses (from eg fa factorminres factorpa factorwlsor principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary

installpackages(ctv)

library(ctv)

taskviews(Psychometrics)

The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages

7

installpackages(list(c(GPArotationmnormt)

Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language

3 Basic data analysis

A number of psych functions facilitate the entry of data and finding basic descriptivestatistics

Remember to run any of the psych functions it is necessary to make the package activeby using the library command

library(psych)

The other packages once installed will be called automatically by psych

It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg

First lt- function(x) library(psych)

31 Getting the data by using readfile

Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be doneusing the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of txttext r rds rda csv xpt or sav the file will be read correctly

mydata lt- readfile()

If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command

mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row and 36 fields the first of which is 4 colums the rest of which are 1 column each

If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (comma

8

separated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)

To read SPSS files and to keep the value labels specify usevaluelabels=TRUE

myspss lt- readfile(usevaluelabels=TRUE) this will keep the value labels for sav files

32 Data input from the clipboard

There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly

readclipboard is the base function for reading data from the clipboard

readclipboardcsv for reading text that is comma delimited

readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)

readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix

readclipboardupper for reading input of an upper triangular matrix

readclipboardfwf for reading in fixed width fields (some very old data sets)

For example given a data set copied to the clipboard from a spreadsheet just enter thecommand

mydata lt- readclipboard()

This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function

gt mydata lt- readclipboard(sep=t) define the tab option or

gt mytabdata lt- readclipboardtab() just use the alternative function

For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below the

9

first variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)

gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))

33 Basic descriptive statistics

Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures

describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an

describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)

gt library(psych)

gt data(satact)

gt describe(satact) basic descriptive statistics

vars n mean sd median trimmed mad min max range skew

gender 1 700 165 048 2 168 000 1 2 1 -061

education 2 700 316 143 3 331 148 0 5 5 -068

age 3 700 2559 950 22 2386 593 13 65 52 164

ACT 4 700 2855 482 29 2884 445 3 36 33 -066

SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064

SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059

kurtosis se

gender -162 002

education -007 005

age 242 036

ACT 053 018

SATV 033 427

SATQ -002 441

These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 6) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

gt basic descriptive statistics by a grouping variable

gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

Descriptive statistics by group

group 1

vars n mean sd se

gender 1 247 100 000 000

10

education 2 247 300 154 010

age 3 247 2586 974 062

ACT 4 247 2879 506 032

SATV 5 247 61511 11416 726

SATQ 6 245 63587 11602 741

------------------------------------------------------------

group 2

vars n mean sd se

gender 1 453 200 000 000

education 2 453 326 135 006

age 3 453 2545 937 044

ACT 4 453 2842 469 022

SATV 5 453 61066 11231 528

SATQ 6 442 59600 11307 538

The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

gt samat lt- describeBy(satactlist(satact$gendersatact$education)

+ skew=FALSEranges=FALSEmat=TRUE)

gt headTail(samat)

item group1 group2 vars n mean sd se

gender1 1 1 0 1 27 1 0 0

gender2 2 2 0 1 30 2 0 0

gender3 3 1 1 1 20 1 0 0

gender4 4 2 1 1 25 2 0 0

ltNAgt ltNAgt ltNAgt

SATQ9 69 1 4 6 51 6359 10412 1458

SATQ10 70 2 4 6 86 59759 10624 1146

SATQ11 71 1 5 6 46 65783 8961 1321

SATQ12 72 2 5 6 93 60672 10555 1095

331 Outlier detection using outlier

One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

332 Basic data cleaning using scrub

If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns

11

gt png( outlierpng )

gt d2 lt- outlier(satactcex=8)

gt devoff()

null device

1

Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

12

3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

gt x lt- matrix(1120ncol=10byrow=TRUE)

gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

gt newx

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

[1] 1 2 NA NA NA 6 7 8 9 10

[2] 11 12 NA NA NA 16 17 18 19 20

[3] 21 22 NA NA NA 26 27 28 29 30

[4] 31 32 33 NA NA 36 37 38 39 40

[5] 41 42 43 44 NA 46 47 48 49 50

[6] 51 52 53 54 55 56 57 58 59 60

[7] 61 62 63 64 65 66 67 68 69 70

[8] 71 72 NA NA NA 76 77 78 79 80

[9] 81 82 NA NA NA 86 87 88 89 90

[10] 91 92 NA NA NA 96 97 98 99 100

[11] 101 102 NA NA NA 106 107 108 109 110

[12] 111 112 NA NA NA 116 117 118 119 120

Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

333 Recoding categorical variables into dummy coded variables

Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

34 Simple descriptive graphics

Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidence

13

limits (Figure 6) This may be turned off by specifying eyes=FALSE densityBy or vio-

linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 5) (Theseare sometimes called ldquolava-lamprdquo plots)

341 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 5)

14

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

15

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

16

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

+ main =Density distributions of four measures of affect )

gt devoff()

null device

1

Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

17

gt data(satact)

gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

Density Plot by gender for SAT V and Q

Obs

erve

d

SATV M SATV F SATQ M SATQ F

200

300

400

500

600

700

800

Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

18

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

19

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

095 confidence limits

Independent Variable

Dep

ende

nt V

aria

ble

bfagree bfcon bfext bfneur bfopen

050

100

150

Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

20

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

Male Female

095 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

21

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

22

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

23

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

24

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

25

data(bfi)gt png( bibarspng )

gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

gt devoff()

null device

1

Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

26

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

27

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

28

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

29

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

30

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 3: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

01 Jump starting the psych packagendasha guide for the impatient

You have installed psych (section 2) and you want to use it without reading much moreWhat should you do

1 Activate the psych package

library(psych)

2 Input your data (section 31) There are two ways to do this

bull Find and read standard files using readfile This will open a search windowfor your operating system which you can use to find the file If the file has asuffix of text txt csv data sav r R rds Rds rda Rda rdata orRData then the file will be opened and the data will be read in

myData lt- readfile() find the appropriate file using your normal operating system

bull Alternatively go to your friendly text editor or data manipulation program(eg Excel) and copy the data to the clipboard Include a first line that has thevariable labels Paste it into psych using the readclipboardtab command

myData lt- readclipboardtab() if on the clipboard

Note that there are number of options for readclipboard for reading in Excelbased files lower triangular files etc

3 Make sure that what you just read is right Describe it (section 33) and perhapslook at the first and last few lines If you have multiple groups try describeBy

dim(myData) What are the dimensions of the data

describe(myData) or

descrbeBy(myDatagroups=mygroups) for descriptive statistics by groups

headTail(myData) show the first and last n lines of a file

4 Look at the patterns in the data If you have fewer than about 12 variables lookat the SPLOM (Scatter Plot Matrix) of the data using pairspanels (section 341)Then use the outlier function to detect outliers

pairspanels(myData)

outlier(myData)

5 Note that you might have some weird subjects probably due to data entry errorsEither edit the data by hand (use the edit command) or just scrub the data (section332)

cleaned lt- scrub(myData max=9) eg change anything great than 9 to NA

6 Graph the data with error bars for each variable (section 343)

errorbars(myData)

3

7 Find the correlations of all of your data lowerCor will by default find the pairwisecorrelations round them to 2 decimals and display the lower off diagonal matrix

bull Descriptively (just the values) (section 347)

r lt- lowerCor(myData) The correlation matrix rounded to 2 decimals

bull Graphically (section 348) Another way is to show a heat map of the correla-tions with the correlation values included

corPlot(r) examine the many options for this function

bull Inferentially (the values the ns and the p values) (section 35)

corrtest(myData)

8 Apply various regression models

Several functions are meant to do multiple regressions either from the raw data orfrom a variancecovariance matrix or a correlation matrix

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

myData lt- satact

colnames(myData) lt- c(mod1med1x1x2y1y2)

setCor(y = c( y1 y2) x = c(x1x2) data = myData)

bull mediate will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables mediatedthrough a mediation variable It then tests the mediation effect using a bootstrap

mediate(y = c( y1 y2) x = c(x1x2) m= med1 data = myData)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple xvariables mediated through a mediation variable It then tests the mediationeffect using a boot strap

mediate(y = c( y1 y2) x = c(x1x2) m= med1 mod = mod1 data = myData)

02 Psychometric functions are summarized in the second vignette

Many additional functions particularly designed for basic and advanced psychomet-rics are discussed more fully in the Overview Vignette A brief review of the functionsavailable is included here In addition there are helpful tutorials for Finding omegaHow to score scales and find reliability and for Using psych for factor analysis athttppersonality-projectorgr

4

bull Test for the number of factors in your data using parallel analysis (faparallelsection ) or Very Simple Structure (vss )

faparallel(myData)

vss(myData)

bull Factor analyze (see section ) the data with a specified number of factors(the default is 1) the default method is minimum residual the default rotationfor more than one factor is oblimin There are many more possibilities (seesections -) Compare the solution to a hierarchical cluster analysis using theICLUST algorithm (Revelle 1979) (see section ) Also consider a hierarchicalfactor solution to find coefficient ω (see )

fa(myData)

iclust(myData)

omega(myData)

If you prefer to do a principal components analysis you may use the principalfunction The default is one component

principal(myData)

bull Some people like to find coefficient α as an estimate of reliability This may bedone for a single scale using the alpha function (see ) Perhaps more usefulis the ability to create several scales as unweighted averages of specified itemsusing the scoreItems function (see ) and to find various estimates of internalconsistency for these scales find their intercorrelations and find scores for allthe subjects

alpha(myData) score all of the items as part of one scale

myKeys lt- makekeys(nvar=20list(first = c(1-35-7810)second=c(24-61115-16)))

myscores lt- scoreItems(myKeysmyData) form several scales

myscores show the highlights of the results

At this point you have had a chance to see the highlights of the psych package and to dosome basic (and advanced) data analysis You might find reading this entire vignette aswell as the Overview Vignette to be helpful to get a broader understanding of what can bedone in R using the psych Remember that the help command () is available for everyfunction Try running the examples for each help page

5

1 Overview of this and related documents

The psych package (Revelle 2015) has been developed at Northwestern University since2005 to include functions most useful for personality psychometric and psychological re-search The package is also meant to supplement a text on psychometric theory (Revelleprep) a draft of which is available at httppersonality-projectorgrbook

Some of the functions (eg readfile readclipboard describe pairspanels scat-terhist errorbars multihist bibars) are useful for basic data entry and descrip-tive analyses

Psychometric applications emphasize techniques for dimension reduction including factoranalysis cluster analysis and principal components analysis The fa function includesfive methods of factor analysis (minimum residual principal axis weighted least squaresgeneralized least squares and maximum likelihood factor analysis) Principal ComponentsAnalysis (PCA) is also available through the use of the principal or pca functions De-termining the number of factors or components to extract may be done by using the VerySimple Structure (Revelle and Rocklin 1979) (vss) Minimum Average Partial correlation(Velicer 1976) (MAP) or parallel analysis (faparallel) criteria These and several othercriteria are included in the nfactors function Two parameter Item Response Theory(IRT) models for dichotomous or polytomous items may be found by factoring tetra-

choric or polychoric correlation matrices and expressing the resulting parameters interms of location and discrimination using irtfa

Bifactor and hierarchical factor structures may be estimated by using Schmid Leimantransformations (Schmid and Leiman 1957) (schmid) to transform a hierarchical factorstructure into a bifactor solution (Holzinger and Swineford 1937) Higher order modelscan also be found using famulti

Scale construction can be done using the Item Cluster Analysis (Revelle 1979) (iclust)function to determine the structure and to calculate reliability coefficients α (Cronbach1951)(alpha scoreItems scoremultiplechoice) β (Revelle 1979 Revelle and Zin-barg 2009) (iclust) and McDonaldrsquos ωh and ωt (McDonald 1999) (omega) Guttmanrsquos sixestimates of internal consistency reliability (Guttman (1945) as well as additional estimates(Revelle and Zinbarg 2009) are in the guttman function The six measures of Intraclasscorrelation coefficients (ICC) discussed by Shrout and Fleiss (1979) are also available

For data with a a multilevel structure (eg items within subjects across time or itemswithin subjects across groups) the describeBy statsBy functions will give basic descrip-tives by group StatsBy also will find within group (or subject) correlations as well as thebetween group correlation

multilevelreliability mlr will find various generalizability statistics for subjects over

6

time and items mlPlot will graph items over for each subject mlArrange converts widedata frames to long data frames suitable for multilevel modeling

Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairspanels cor-relation ldquoheat mapsrdquo (corPlot) factor cluster and structural diagrams using fadiagramiclustdiagram structurediagram and hetdiagram as well as item response charac-teristics and item and test information characteristic curves plotirt and plotpoly

This vignette is meant to give an overview of the psych package That is it is meantto give a summary of the main functions in the psych package with examples of howthey are used for data description dimension reduction and scale construction The ex-tended user manual at psych_manualpdf includes examples of graphic output and moreextensive demonstrations than are found in the help menus (Also available at http

personality-projectorgrpsych_manualpdf) The vignette psych for sem atpsych_for_sempdf discusses how to use psych as a front end to the sem package of JohnFox (Fox et al 2012) (The vignette is also available at httppersonality-project

orgrbookpsych_for_sempdf)

For a step by step tutorial in the use of the psych package and the base functions inR for basic personality research see the guide for using R for personality research athttppersonalitytheoryorgrrshorthtml For an introduction to psychometrictheory with applications in R see the draft chapters at httppersonality-project

orgrbook)

2 Getting started

Some of the functions described in the Overview Vignette require other packages This isnot the case for the functions listed in this Introduction Particularly useful for rotatingthe results of factor analyses (from eg fa factorminres factorpa factorwlsor principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary

installpackages(ctv)

library(ctv)

taskviews(Psychometrics)

The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages

7

installpackages(list(c(GPArotationmnormt)

Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language

3 Basic data analysis

A number of psych functions facilitate the entry of data and finding basic descriptivestatistics

Remember to run any of the psych functions it is necessary to make the package activeby using the library command

library(psych)

The other packages once installed will be called automatically by psych

It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg

First lt- function(x) library(psych)

31 Getting the data by using readfile

Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be doneusing the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of txttext r rds rda csv xpt or sav the file will be read correctly

mydata lt- readfile()

If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command

mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row and 36 fields the first of which is 4 colums the rest of which are 1 column each

If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (comma

8

separated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)

To read SPSS files and to keep the value labels specify usevaluelabels=TRUE

myspss lt- readfile(usevaluelabels=TRUE) this will keep the value labels for sav files

32 Data input from the clipboard

There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly

readclipboard is the base function for reading data from the clipboard

readclipboardcsv for reading text that is comma delimited

readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)

readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix

readclipboardupper for reading input of an upper triangular matrix

readclipboardfwf for reading in fixed width fields (some very old data sets)

For example given a data set copied to the clipboard from a spreadsheet just enter thecommand

mydata lt- readclipboard()

This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function

gt mydata lt- readclipboard(sep=t) define the tab option or

gt mytabdata lt- readclipboardtab() just use the alternative function

For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below the

9

first variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)

gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))

33 Basic descriptive statistics

Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures

describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an

describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)

gt library(psych)

gt data(satact)

gt describe(satact) basic descriptive statistics

vars n mean sd median trimmed mad min max range skew

gender 1 700 165 048 2 168 000 1 2 1 -061

education 2 700 316 143 3 331 148 0 5 5 -068

age 3 700 2559 950 22 2386 593 13 65 52 164

ACT 4 700 2855 482 29 2884 445 3 36 33 -066

SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064

SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059

kurtosis se

gender -162 002

education -007 005

age 242 036

ACT 053 018

SATV 033 427

SATQ -002 441

These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 6) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

gt basic descriptive statistics by a grouping variable

gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

Descriptive statistics by group

group 1

vars n mean sd se

gender 1 247 100 000 000

10

education 2 247 300 154 010

age 3 247 2586 974 062

ACT 4 247 2879 506 032

SATV 5 247 61511 11416 726

SATQ 6 245 63587 11602 741

------------------------------------------------------------

group 2

vars n mean sd se

gender 1 453 200 000 000

education 2 453 326 135 006

age 3 453 2545 937 044

ACT 4 453 2842 469 022

SATV 5 453 61066 11231 528

SATQ 6 442 59600 11307 538

The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

gt samat lt- describeBy(satactlist(satact$gendersatact$education)

+ skew=FALSEranges=FALSEmat=TRUE)

gt headTail(samat)

item group1 group2 vars n mean sd se

gender1 1 1 0 1 27 1 0 0

gender2 2 2 0 1 30 2 0 0

gender3 3 1 1 1 20 1 0 0

gender4 4 2 1 1 25 2 0 0

ltNAgt ltNAgt ltNAgt

SATQ9 69 1 4 6 51 6359 10412 1458

SATQ10 70 2 4 6 86 59759 10624 1146

SATQ11 71 1 5 6 46 65783 8961 1321

SATQ12 72 2 5 6 93 60672 10555 1095

331 Outlier detection using outlier

One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

332 Basic data cleaning using scrub

If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns

11

gt png( outlierpng )

gt d2 lt- outlier(satactcex=8)

gt devoff()

null device

1

Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

12

3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

gt x lt- matrix(1120ncol=10byrow=TRUE)

gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

gt newx

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

[1] 1 2 NA NA NA 6 7 8 9 10

[2] 11 12 NA NA NA 16 17 18 19 20

[3] 21 22 NA NA NA 26 27 28 29 30

[4] 31 32 33 NA NA 36 37 38 39 40

[5] 41 42 43 44 NA 46 47 48 49 50

[6] 51 52 53 54 55 56 57 58 59 60

[7] 61 62 63 64 65 66 67 68 69 70

[8] 71 72 NA NA NA 76 77 78 79 80

[9] 81 82 NA NA NA 86 87 88 89 90

[10] 91 92 NA NA NA 96 97 98 99 100

[11] 101 102 NA NA NA 106 107 108 109 110

[12] 111 112 NA NA NA 116 117 118 119 120

Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

333 Recoding categorical variables into dummy coded variables

Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

34 Simple descriptive graphics

Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidence

13

limits (Figure 6) This may be turned off by specifying eyes=FALSE densityBy or vio-

linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 5) (Theseare sometimes called ldquolava-lamprdquo plots)

341 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 5)

14

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

15

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

16

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

+ main =Density distributions of four measures of affect )

gt devoff()

null device

1

Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

17

gt data(satact)

gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

Density Plot by gender for SAT V and Q

Obs

erve

d

SATV M SATV F SATQ M SATQ F

200

300

400

500

600

700

800

Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

18

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

19

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

095 confidence limits

Independent Variable

Dep

ende

nt V

aria

ble

bfagree bfcon bfext bfneur bfopen

050

100

150

Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

20

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

Male Female

095 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

21

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

22

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

23

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

24

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

25

data(bfi)gt png( bibarspng )

gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

gt devoff()

null device

1

Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

26

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

27

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

28

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

29

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

30

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 4: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

7 Find the correlations of all of your data lowerCor will by default find the pairwisecorrelations round them to 2 decimals and display the lower off diagonal matrix

bull Descriptively (just the values) (section 347)

r lt- lowerCor(myData) The correlation matrix rounded to 2 decimals

bull Graphically (section 348) Another way is to show a heat map of the correla-tions with the correlation values included

corPlot(r) examine the many options for this function

bull Inferentially (the values the ns and the p values) (section 35)

corrtest(myData)

8 Apply various regression models

Several functions are meant to do multiple regressions either from the raw data orfrom a variancecovariance matrix or a correlation matrix

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

myData lt- satact

colnames(myData) lt- c(mod1med1x1x2y1y2)

setCor(y = c( y1 y2) x = c(x1x2) data = myData)

bull mediate will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables mediatedthrough a mediation variable It then tests the mediation effect using a bootstrap

mediate(y = c( y1 y2) x = c(x1x2) m= med1 data = myData)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple xvariables mediated through a mediation variable It then tests the mediationeffect using a boot strap

mediate(y = c( y1 y2) x = c(x1x2) m= med1 mod = mod1 data = myData)

02 Psychometric functions are summarized in the second vignette

Many additional functions particularly designed for basic and advanced psychomet-rics are discussed more fully in the Overview Vignette A brief review of the functionsavailable is included here In addition there are helpful tutorials for Finding omegaHow to score scales and find reliability and for Using psych for factor analysis athttppersonality-projectorgr

4

bull Test for the number of factors in your data using parallel analysis (faparallelsection ) or Very Simple Structure (vss )

faparallel(myData)

vss(myData)

bull Factor analyze (see section ) the data with a specified number of factors(the default is 1) the default method is minimum residual the default rotationfor more than one factor is oblimin There are many more possibilities (seesections -) Compare the solution to a hierarchical cluster analysis using theICLUST algorithm (Revelle 1979) (see section ) Also consider a hierarchicalfactor solution to find coefficient ω (see )

fa(myData)

iclust(myData)

omega(myData)

If you prefer to do a principal components analysis you may use the principalfunction The default is one component

principal(myData)

bull Some people like to find coefficient α as an estimate of reliability This may bedone for a single scale using the alpha function (see ) Perhaps more usefulis the ability to create several scales as unweighted averages of specified itemsusing the scoreItems function (see ) and to find various estimates of internalconsistency for these scales find their intercorrelations and find scores for allthe subjects

alpha(myData) score all of the items as part of one scale

myKeys lt- makekeys(nvar=20list(first = c(1-35-7810)second=c(24-61115-16)))

myscores lt- scoreItems(myKeysmyData) form several scales

myscores show the highlights of the results

At this point you have had a chance to see the highlights of the psych package and to dosome basic (and advanced) data analysis You might find reading this entire vignette aswell as the Overview Vignette to be helpful to get a broader understanding of what can bedone in R using the psych Remember that the help command () is available for everyfunction Try running the examples for each help page

5

1 Overview of this and related documents

The psych package (Revelle 2015) has been developed at Northwestern University since2005 to include functions most useful for personality psychometric and psychological re-search The package is also meant to supplement a text on psychometric theory (Revelleprep) a draft of which is available at httppersonality-projectorgrbook

Some of the functions (eg readfile readclipboard describe pairspanels scat-terhist errorbars multihist bibars) are useful for basic data entry and descrip-tive analyses

Psychometric applications emphasize techniques for dimension reduction including factoranalysis cluster analysis and principal components analysis The fa function includesfive methods of factor analysis (minimum residual principal axis weighted least squaresgeneralized least squares and maximum likelihood factor analysis) Principal ComponentsAnalysis (PCA) is also available through the use of the principal or pca functions De-termining the number of factors or components to extract may be done by using the VerySimple Structure (Revelle and Rocklin 1979) (vss) Minimum Average Partial correlation(Velicer 1976) (MAP) or parallel analysis (faparallel) criteria These and several othercriteria are included in the nfactors function Two parameter Item Response Theory(IRT) models for dichotomous or polytomous items may be found by factoring tetra-

choric or polychoric correlation matrices and expressing the resulting parameters interms of location and discrimination using irtfa

Bifactor and hierarchical factor structures may be estimated by using Schmid Leimantransformations (Schmid and Leiman 1957) (schmid) to transform a hierarchical factorstructure into a bifactor solution (Holzinger and Swineford 1937) Higher order modelscan also be found using famulti

Scale construction can be done using the Item Cluster Analysis (Revelle 1979) (iclust)function to determine the structure and to calculate reliability coefficients α (Cronbach1951)(alpha scoreItems scoremultiplechoice) β (Revelle 1979 Revelle and Zin-barg 2009) (iclust) and McDonaldrsquos ωh and ωt (McDonald 1999) (omega) Guttmanrsquos sixestimates of internal consistency reliability (Guttman (1945) as well as additional estimates(Revelle and Zinbarg 2009) are in the guttman function The six measures of Intraclasscorrelation coefficients (ICC) discussed by Shrout and Fleiss (1979) are also available

For data with a a multilevel structure (eg items within subjects across time or itemswithin subjects across groups) the describeBy statsBy functions will give basic descrip-tives by group StatsBy also will find within group (or subject) correlations as well as thebetween group correlation

multilevelreliability mlr will find various generalizability statistics for subjects over

6

time and items mlPlot will graph items over for each subject mlArrange converts widedata frames to long data frames suitable for multilevel modeling

Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairspanels cor-relation ldquoheat mapsrdquo (corPlot) factor cluster and structural diagrams using fadiagramiclustdiagram structurediagram and hetdiagram as well as item response charac-teristics and item and test information characteristic curves plotirt and plotpoly

This vignette is meant to give an overview of the psych package That is it is meantto give a summary of the main functions in the psych package with examples of howthey are used for data description dimension reduction and scale construction The ex-tended user manual at psych_manualpdf includes examples of graphic output and moreextensive demonstrations than are found in the help menus (Also available at http

personality-projectorgrpsych_manualpdf) The vignette psych for sem atpsych_for_sempdf discusses how to use psych as a front end to the sem package of JohnFox (Fox et al 2012) (The vignette is also available at httppersonality-project

orgrbookpsych_for_sempdf)

For a step by step tutorial in the use of the psych package and the base functions inR for basic personality research see the guide for using R for personality research athttppersonalitytheoryorgrrshorthtml For an introduction to psychometrictheory with applications in R see the draft chapters at httppersonality-project

orgrbook)

2 Getting started

Some of the functions described in the Overview Vignette require other packages This isnot the case for the functions listed in this Introduction Particularly useful for rotatingthe results of factor analyses (from eg fa factorminres factorpa factorwlsor principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary

installpackages(ctv)

library(ctv)

taskviews(Psychometrics)

The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages

7

installpackages(list(c(GPArotationmnormt)

Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language

3 Basic data analysis

A number of psych functions facilitate the entry of data and finding basic descriptivestatistics

Remember to run any of the psych functions it is necessary to make the package activeby using the library command

library(psych)

The other packages once installed will be called automatically by psych

It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg

First lt- function(x) library(psych)

31 Getting the data by using readfile

Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be doneusing the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of txttext r rds rda csv xpt or sav the file will be read correctly

mydata lt- readfile()

If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command

mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row and 36 fields the first of which is 4 colums the rest of which are 1 column each

If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (comma

8

separated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)

To read SPSS files and to keep the value labels specify usevaluelabels=TRUE

myspss lt- readfile(usevaluelabels=TRUE) this will keep the value labels for sav files

32 Data input from the clipboard

There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly

readclipboard is the base function for reading data from the clipboard

readclipboardcsv for reading text that is comma delimited

readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)

readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix

readclipboardupper for reading input of an upper triangular matrix

readclipboardfwf for reading in fixed width fields (some very old data sets)

For example given a data set copied to the clipboard from a spreadsheet just enter thecommand

mydata lt- readclipboard()

This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function

gt mydata lt- readclipboard(sep=t) define the tab option or

gt mytabdata lt- readclipboardtab() just use the alternative function

For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below the

9

first variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)

gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))

33 Basic descriptive statistics

Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures

describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an

describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)

gt library(psych)

gt data(satact)

gt describe(satact) basic descriptive statistics

vars n mean sd median trimmed mad min max range skew

gender 1 700 165 048 2 168 000 1 2 1 -061

education 2 700 316 143 3 331 148 0 5 5 -068

age 3 700 2559 950 22 2386 593 13 65 52 164

ACT 4 700 2855 482 29 2884 445 3 36 33 -066

SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064

SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059

kurtosis se

gender -162 002

education -007 005

age 242 036

ACT 053 018

SATV 033 427

SATQ -002 441

These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 6) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

gt basic descriptive statistics by a grouping variable

gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

Descriptive statistics by group

group 1

vars n mean sd se

gender 1 247 100 000 000

10

education 2 247 300 154 010

age 3 247 2586 974 062

ACT 4 247 2879 506 032

SATV 5 247 61511 11416 726

SATQ 6 245 63587 11602 741

------------------------------------------------------------

group 2

vars n mean sd se

gender 1 453 200 000 000

education 2 453 326 135 006

age 3 453 2545 937 044

ACT 4 453 2842 469 022

SATV 5 453 61066 11231 528

SATQ 6 442 59600 11307 538

The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

gt samat lt- describeBy(satactlist(satact$gendersatact$education)

+ skew=FALSEranges=FALSEmat=TRUE)

gt headTail(samat)

item group1 group2 vars n mean sd se

gender1 1 1 0 1 27 1 0 0

gender2 2 2 0 1 30 2 0 0

gender3 3 1 1 1 20 1 0 0

gender4 4 2 1 1 25 2 0 0

ltNAgt ltNAgt ltNAgt

SATQ9 69 1 4 6 51 6359 10412 1458

SATQ10 70 2 4 6 86 59759 10624 1146

SATQ11 71 1 5 6 46 65783 8961 1321

SATQ12 72 2 5 6 93 60672 10555 1095

331 Outlier detection using outlier

One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

332 Basic data cleaning using scrub

If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns

11

gt png( outlierpng )

gt d2 lt- outlier(satactcex=8)

gt devoff()

null device

1

Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

12

3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

gt x lt- matrix(1120ncol=10byrow=TRUE)

gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

gt newx

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

[1] 1 2 NA NA NA 6 7 8 9 10

[2] 11 12 NA NA NA 16 17 18 19 20

[3] 21 22 NA NA NA 26 27 28 29 30

[4] 31 32 33 NA NA 36 37 38 39 40

[5] 41 42 43 44 NA 46 47 48 49 50

[6] 51 52 53 54 55 56 57 58 59 60

[7] 61 62 63 64 65 66 67 68 69 70

[8] 71 72 NA NA NA 76 77 78 79 80

[9] 81 82 NA NA NA 86 87 88 89 90

[10] 91 92 NA NA NA 96 97 98 99 100

[11] 101 102 NA NA NA 106 107 108 109 110

[12] 111 112 NA NA NA 116 117 118 119 120

Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

333 Recoding categorical variables into dummy coded variables

Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

34 Simple descriptive graphics

Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidence

13

limits (Figure 6) This may be turned off by specifying eyes=FALSE densityBy or vio-

linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 5) (Theseare sometimes called ldquolava-lamprdquo plots)

341 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 5)

14

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

15

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

16

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

+ main =Density distributions of four measures of affect )

gt devoff()

null device

1

Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

17

gt data(satact)

gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

Density Plot by gender for SAT V and Q

Obs

erve

d

SATV M SATV F SATQ M SATQ F

200

300

400

500

600

700

800

Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

18

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

19

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

095 confidence limits

Independent Variable

Dep

ende

nt V

aria

ble

bfagree bfcon bfext bfneur bfopen

050

100

150

Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

20

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

Male Female

095 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

21

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

22

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

23

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

24

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

25

data(bfi)gt png( bibarspng )

gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

gt devoff()

null device

1

Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

26

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

27

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

28

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

29

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

30

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 5: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

bull Test for the number of factors in your data using parallel analysis (faparallelsection ) or Very Simple Structure (vss )

faparallel(myData)

vss(myData)

bull Factor analyze (see section ) the data with a specified number of factors(the default is 1) the default method is minimum residual the default rotationfor more than one factor is oblimin There are many more possibilities (seesections -) Compare the solution to a hierarchical cluster analysis using theICLUST algorithm (Revelle 1979) (see section ) Also consider a hierarchicalfactor solution to find coefficient ω (see )

fa(myData)

iclust(myData)

omega(myData)

If you prefer to do a principal components analysis you may use the principalfunction The default is one component

principal(myData)

bull Some people like to find coefficient α as an estimate of reliability This may bedone for a single scale using the alpha function (see ) Perhaps more usefulis the ability to create several scales as unweighted averages of specified itemsusing the scoreItems function (see ) and to find various estimates of internalconsistency for these scales find their intercorrelations and find scores for allthe subjects

alpha(myData) score all of the items as part of one scale

myKeys lt- makekeys(nvar=20list(first = c(1-35-7810)second=c(24-61115-16)))

myscores lt- scoreItems(myKeysmyData) form several scales

myscores show the highlights of the results

At this point you have had a chance to see the highlights of the psych package and to dosome basic (and advanced) data analysis You might find reading this entire vignette aswell as the Overview Vignette to be helpful to get a broader understanding of what can bedone in R using the psych Remember that the help command () is available for everyfunction Try running the examples for each help page

5

1 Overview of this and related documents

The psych package (Revelle 2015) has been developed at Northwestern University since2005 to include functions most useful for personality psychometric and psychological re-search The package is also meant to supplement a text on psychometric theory (Revelleprep) a draft of which is available at httppersonality-projectorgrbook

Some of the functions (eg readfile readclipboard describe pairspanels scat-terhist errorbars multihist bibars) are useful for basic data entry and descrip-tive analyses

Psychometric applications emphasize techniques for dimension reduction including factoranalysis cluster analysis and principal components analysis The fa function includesfive methods of factor analysis (minimum residual principal axis weighted least squaresgeneralized least squares and maximum likelihood factor analysis) Principal ComponentsAnalysis (PCA) is also available through the use of the principal or pca functions De-termining the number of factors or components to extract may be done by using the VerySimple Structure (Revelle and Rocklin 1979) (vss) Minimum Average Partial correlation(Velicer 1976) (MAP) or parallel analysis (faparallel) criteria These and several othercriteria are included in the nfactors function Two parameter Item Response Theory(IRT) models for dichotomous or polytomous items may be found by factoring tetra-

choric or polychoric correlation matrices and expressing the resulting parameters interms of location and discrimination using irtfa

Bifactor and hierarchical factor structures may be estimated by using Schmid Leimantransformations (Schmid and Leiman 1957) (schmid) to transform a hierarchical factorstructure into a bifactor solution (Holzinger and Swineford 1937) Higher order modelscan also be found using famulti

Scale construction can be done using the Item Cluster Analysis (Revelle 1979) (iclust)function to determine the structure and to calculate reliability coefficients α (Cronbach1951)(alpha scoreItems scoremultiplechoice) β (Revelle 1979 Revelle and Zin-barg 2009) (iclust) and McDonaldrsquos ωh and ωt (McDonald 1999) (omega) Guttmanrsquos sixestimates of internal consistency reliability (Guttman (1945) as well as additional estimates(Revelle and Zinbarg 2009) are in the guttman function The six measures of Intraclasscorrelation coefficients (ICC) discussed by Shrout and Fleiss (1979) are also available

For data with a a multilevel structure (eg items within subjects across time or itemswithin subjects across groups) the describeBy statsBy functions will give basic descrip-tives by group StatsBy also will find within group (or subject) correlations as well as thebetween group correlation

multilevelreliability mlr will find various generalizability statistics for subjects over

6

time and items mlPlot will graph items over for each subject mlArrange converts widedata frames to long data frames suitable for multilevel modeling

Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairspanels cor-relation ldquoheat mapsrdquo (corPlot) factor cluster and structural diagrams using fadiagramiclustdiagram structurediagram and hetdiagram as well as item response charac-teristics and item and test information characteristic curves plotirt and plotpoly

This vignette is meant to give an overview of the psych package That is it is meantto give a summary of the main functions in the psych package with examples of howthey are used for data description dimension reduction and scale construction The ex-tended user manual at psych_manualpdf includes examples of graphic output and moreextensive demonstrations than are found in the help menus (Also available at http

personality-projectorgrpsych_manualpdf) The vignette psych for sem atpsych_for_sempdf discusses how to use psych as a front end to the sem package of JohnFox (Fox et al 2012) (The vignette is also available at httppersonality-project

orgrbookpsych_for_sempdf)

For a step by step tutorial in the use of the psych package and the base functions inR for basic personality research see the guide for using R for personality research athttppersonalitytheoryorgrrshorthtml For an introduction to psychometrictheory with applications in R see the draft chapters at httppersonality-project

orgrbook)

2 Getting started

Some of the functions described in the Overview Vignette require other packages This isnot the case for the functions listed in this Introduction Particularly useful for rotatingthe results of factor analyses (from eg fa factorminres factorpa factorwlsor principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary

installpackages(ctv)

library(ctv)

taskviews(Psychometrics)

The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages

7

installpackages(list(c(GPArotationmnormt)

Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language

3 Basic data analysis

A number of psych functions facilitate the entry of data and finding basic descriptivestatistics

Remember to run any of the psych functions it is necessary to make the package activeby using the library command

library(psych)

The other packages once installed will be called automatically by psych

It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg

First lt- function(x) library(psych)

31 Getting the data by using readfile

Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be doneusing the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of txttext r rds rda csv xpt or sav the file will be read correctly

mydata lt- readfile()

If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command

mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row and 36 fields the first of which is 4 colums the rest of which are 1 column each

If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (comma

8

separated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)

To read SPSS files and to keep the value labels specify usevaluelabels=TRUE

myspss lt- readfile(usevaluelabels=TRUE) this will keep the value labels for sav files

32 Data input from the clipboard

There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly

readclipboard is the base function for reading data from the clipboard

readclipboardcsv for reading text that is comma delimited

readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)

readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix

readclipboardupper for reading input of an upper triangular matrix

readclipboardfwf for reading in fixed width fields (some very old data sets)

For example given a data set copied to the clipboard from a spreadsheet just enter thecommand

mydata lt- readclipboard()

This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function

gt mydata lt- readclipboard(sep=t) define the tab option or

gt mytabdata lt- readclipboardtab() just use the alternative function

For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below the

9

first variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)

gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))

33 Basic descriptive statistics

Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures

describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an

describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)

gt library(psych)

gt data(satact)

gt describe(satact) basic descriptive statistics

vars n mean sd median trimmed mad min max range skew

gender 1 700 165 048 2 168 000 1 2 1 -061

education 2 700 316 143 3 331 148 0 5 5 -068

age 3 700 2559 950 22 2386 593 13 65 52 164

ACT 4 700 2855 482 29 2884 445 3 36 33 -066

SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064

SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059

kurtosis se

gender -162 002

education -007 005

age 242 036

ACT 053 018

SATV 033 427

SATQ -002 441

These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 6) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

gt basic descriptive statistics by a grouping variable

gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

Descriptive statistics by group

group 1

vars n mean sd se

gender 1 247 100 000 000

10

education 2 247 300 154 010

age 3 247 2586 974 062

ACT 4 247 2879 506 032

SATV 5 247 61511 11416 726

SATQ 6 245 63587 11602 741

------------------------------------------------------------

group 2

vars n mean sd se

gender 1 453 200 000 000

education 2 453 326 135 006

age 3 453 2545 937 044

ACT 4 453 2842 469 022

SATV 5 453 61066 11231 528

SATQ 6 442 59600 11307 538

The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

gt samat lt- describeBy(satactlist(satact$gendersatact$education)

+ skew=FALSEranges=FALSEmat=TRUE)

gt headTail(samat)

item group1 group2 vars n mean sd se

gender1 1 1 0 1 27 1 0 0

gender2 2 2 0 1 30 2 0 0

gender3 3 1 1 1 20 1 0 0

gender4 4 2 1 1 25 2 0 0

ltNAgt ltNAgt ltNAgt

SATQ9 69 1 4 6 51 6359 10412 1458

SATQ10 70 2 4 6 86 59759 10624 1146

SATQ11 71 1 5 6 46 65783 8961 1321

SATQ12 72 2 5 6 93 60672 10555 1095

331 Outlier detection using outlier

One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

332 Basic data cleaning using scrub

If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns

11

gt png( outlierpng )

gt d2 lt- outlier(satactcex=8)

gt devoff()

null device

1

Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

12

3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

gt x lt- matrix(1120ncol=10byrow=TRUE)

gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

gt newx

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

[1] 1 2 NA NA NA 6 7 8 9 10

[2] 11 12 NA NA NA 16 17 18 19 20

[3] 21 22 NA NA NA 26 27 28 29 30

[4] 31 32 33 NA NA 36 37 38 39 40

[5] 41 42 43 44 NA 46 47 48 49 50

[6] 51 52 53 54 55 56 57 58 59 60

[7] 61 62 63 64 65 66 67 68 69 70

[8] 71 72 NA NA NA 76 77 78 79 80

[9] 81 82 NA NA NA 86 87 88 89 90

[10] 91 92 NA NA NA 96 97 98 99 100

[11] 101 102 NA NA NA 106 107 108 109 110

[12] 111 112 NA NA NA 116 117 118 119 120

Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

333 Recoding categorical variables into dummy coded variables

Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

34 Simple descriptive graphics

Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidence

13

limits (Figure 6) This may be turned off by specifying eyes=FALSE densityBy or vio-

linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 5) (Theseare sometimes called ldquolava-lamprdquo plots)

341 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 5)

14

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

15

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

16

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

+ main =Density distributions of four measures of affect )

gt devoff()

null device

1

Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

17

gt data(satact)

gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

Density Plot by gender for SAT V and Q

Obs

erve

d

SATV M SATV F SATQ M SATQ F

200

300

400

500

600

700

800

Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

18

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

19

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

095 confidence limits

Independent Variable

Dep

ende

nt V

aria

ble

bfagree bfcon bfext bfneur bfopen

050

100

150

Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

20

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

Male Female

095 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

21

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

22

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

23

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

24

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

25

data(bfi)gt png( bibarspng )

gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

gt devoff()

null device

1

Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

26

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

27

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

28

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

29

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

30

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 6: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

1 Overview of this and related documents

The psych package (Revelle 2015) has been developed at Northwestern University since2005 to include functions most useful for personality psychometric and psychological re-search The package is also meant to supplement a text on psychometric theory (Revelleprep) a draft of which is available at httppersonality-projectorgrbook

Some of the functions (eg readfile readclipboard describe pairspanels scat-terhist errorbars multihist bibars) are useful for basic data entry and descrip-tive analyses

Psychometric applications emphasize techniques for dimension reduction including factoranalysis cluster analysis and principal components analysis The fa function includesfive methods of factor analysis (minimum residual principal axis weighted least squaresgeneralized least squares and maximum likelihood factor analysis) Principal ComponentsAnalysis (PCA) is also available through the use of the principal or pca functions De-termining the number of factors or components to extract may be done by using the VerySimple Structure (Revelle and Rocklin 1979) (vss) Minimum Average Partial correlation(Velicer 1976) (MAP) or parallel analysis (faparallel) criteria These and several othercriteria are included in the nfactors function Two parameter Item Response Theory(IRT) models for dichotomous or polytomous items may be found by factoring tetra-

choric or polychoric correlation matrices and expressing the resulting parameters interms of location and discrimination using irtfa

Bifactor and hierarchical factor structures may be estimated by using Schmid Leimantransformations (Schmid and Leiman 1957) (schmid) to transform a hierarchical factorstructure into a bifactor solution (Holzinger and Swineford 1937) Higher order modelscan also be found using famulti

Scale construction can be done using the Item Cluster Analysis (Revelle 1979) (iclust)function to determine the structure and to calculate reliability coefficients α (Cronbach1951)(alpha scoreItems scoremultiplechoice) β (Revelle 1979 Revelle and Zin-barg 2009) (iclust) and McDonaldrsquos ωh and ωt (McDonald 1999) (omega) Guttmanrsquos sixestimates of internal consistency reliability (Guttman (1945) as well as additional estimates(Revelle and Zinbarg 2009) are in the guttman function The six measures of Intraclasscorrelation coefficients (ICC) discussed by Shrout and Fleiss (1979) are also available

For data with a a multilevel structure (eg items within subjects across time or itemswithin subjects across groups) the describeBy statsBy functions will give basic descrip-tives by group StatsBy also will find within group (or subject) correlations as well as thebetween group correlation

multilevelreliability mlr will find various generalizability statistics for subjects over

6

time and items mlPlot will graph items over for each subject mlArrange converts widedata frames to long data frames suitable for multilevel modeling

Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairspanels cor-relation ldquoheat mapsrdquo (corPlot) factor cluster and structural diagrams using fadiagramiclustdiagram structurediagram and hetdiagram as well as item response charac-teristics and item and test information characteristic curves plotirt and plotpoly

This vignette is meant to give an overview of the psych package That is it is meantto give a summary of the main functions in the psych package with examples of howthey are used for data description dimension reduction and scale construction The ex-tended user manual at psych_manualpdf includes examples of graphic output and moreextensive demonstrations than are found in the help menus (Also available at http

personality-projectorgrpsych_manualpdf) The vignette psych for sem atpsych_for_sempdf discusses how to use psych as a front end to the sem package of JohnFox (Fox et al 2012) (The vignette is also available at httppersonality-project

orgrbookpsych_for_sempdf)

For a step by step tutorial in the use of the psych package and the base functions inR for basic personality research see the guide for using R for personality research athttppersonalitytheoryorgrrshorthtml For an introduction to psychometrictheory with applications in R see the draft chapters at httppersonality-project

orgrbook)

2 Getting started

Some of the functions described in the Overview Vignette require other packages This isnot the case for the functions listed in this Introduction Particularly useful for rotatingthe results of factor analyses (from eg fa factorminres factorpa factorwlsor principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary

installpackages(ctv)

library(ctv)

taskviews(Psychometrics)

The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages

7

installpackages(list(c(GPArotationmnormt)

Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language

3 Basic data analysis

A number of psych functions facilitate the entry of data and finding basic descriptivestatistics

Remember to run any of the psych functions it is necessary to make the package activeby using the library command

library(psych)

The other packages once installed will be called automatically by psych

It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg

First lt- function(x) library(psych)

31 Getting the data by using readfile

Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be doneusing the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of txttext r rds rda csv xpt or sav the file will be read correctly

mydata lt- readfile()

If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command

mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row and 36 fields the first of which is 4 colums the rest of which are 1 column each

If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (comma

8

separated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)

To read SPSS files and to keep the value labels specify usevaluelabels=TRUE

myspss lt- readfile(usevaluelabels=TRUE) this will keep the value labels for sav files

32 Data input from the clipboard

There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly

readclipboard is the base function for reading data from the clipboard

readclipboardcsv for reading text that is comma delimited

readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)

readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix

readclipboardupper for reading input of an upper triangular matrix

readclipboardfwf for reading in fixed width fields (some very old data sets)

For example given a data set copied to the clipboard from a spreadsheet just enter thecommand

mydata lt- readclipboard()

This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function

gt mydata lt- readclipboard(sep=t) define the tab option or

gt mytabdata lt- readclipboardtab() just use the alternative function

For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below the

9

first variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)

gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))

33 Basic descriptive statistics

Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures

describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an

describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)

gt library(psych)

gt data(satact)

gt describe(satact) basic descriptive statistics

vars n mean sd median trimmed mad min max range skew

gender 1 700 165 048 2 168 000 1 2 1 -061

education 2 700 316 143 3 331 148 0 5 5 -068

age 3 700 2559 950 22 2386 593 13 65 52 164

ACT 4 700 2855 482 29 2884 445 3 36 33 -066

SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064

SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059

kurtosis se

gender -162 002

education -007 005

age 242 036

ACT 053 018

SATV 033 427

SATQ -002 441

These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 6) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

gt basic descriptive statistics by a grouping variable

gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

Descriptive statistics by group

group 1

vars n mean sd se

gender 1 247 100 000 000

10

education 2 247 300 154 010

age 3 247 2586 974 062

ACT 4 247 2879 506 032

SATV 5 247 61511 11416 726

SATQ 6 245 63587 11602 741

------------------------------------------------------------

group 2

vars n mean sd se

gender 1 453 200 000 000

education 2 453 326 135 006

age 3 453 2545 937 044

ACT 4 453 2842 469 022

SATV 5 453 61066 11231 528

SATQ 6 442 59600 11307 538

The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

gt samat lt- describeBy(satactlist(satact$gendersatact$education)

+ skew=FALSEranges=FALSEmat=TRUE)

gt headTail(samat)

item group1 group2 vars n mean sd se

gender1 1 1 0 1 27 1 0 0

gender2 2 2 0 1 30 2 0 0

gender3 3 1 1 1 20 1 0 0

gender4 4 2 1 1 25 2 0 0

ltNAgt ltNAgt ltNAgt

SATQ9 69 1 4 6 51 6359 10412 1458

SATQ10 70 2 4 6 86 59759 10624 1146

SATQ11 71 1 5 6 46 65783 8961 1321

SATQ12 72 2 5 6 93 60672 10555 1095

331 Outlier detection using outlier

One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

332 Basic data cleaning using scrub

If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns

11

gt png( outlierpng )

gt d2 lt- outlier(satactcex=8)

gt devoff()

null device

1

Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

12

3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

gt x lt- matrix(1120ncol=10byrow=TRUE)

gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

gt newx

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

[1] 1 2 NA NA NA 6 7 8 9 10

[2] 11 12 NA NA NA 16 17 18 19 20

[3] 21 22 NA NA NA 26 27 28 29 30

[4] 31 32 33 NA NA 36 37 38 39 40

[5] 41 42 43 44 NA 46 47 48 49 50

[6] 51 52 53 54 55 56 57 58 59 60

[7] 61 62 63 64 65 66 67 68 69 70

[8] 71 72 NA NA NA 76 77 78 79 80

[9] 81 82 NA NA NA 86 87 88 89 90

[10] 91 92 NA NA NA 96 97 98 99 100

[11] 101 102 NA NA NA 106 107 108 109 110

[12] 111 112 NA NA NA 116 117 118 119 120

Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

333 Recoding categorical variables into dummy coded variables

Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

34 Simple descriptive graphics

Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidence

13

limits (Figure 6) This may be turned off by specifying eyes=FALSE densityBy or vio-

linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 5) (Theseare sometimes called ldquolava-lamprdquo plots)

341 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 5)

14

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

15

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

16

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

+ main =Density distributions of four measures of affect )

gt devoff()

null device

1

Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

17

gt data(satact)

gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

Density Plot by gender for SAT V and Q

Obs

erve

d

SATV M SATV F SATQ M SATQ F

200

300

400

500

600

700

800

Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

18

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

19

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

095 confidence limits

Independent Variable

Dep

ende

nt V

aria

ble

bfagree bfcon bfext bfneur bfopen

050

100

150

Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

20

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

Male Female

095 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

21

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

22

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

23

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

24

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

25

data(bfi)gt png( bibarspng )

gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

gt devoff()

null device

1

Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

26

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

27

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

28

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

29

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

30

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 7: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

time and items mlPlot will graph items over for each subject mlArrange converts widedata frames to long data frames suitable for multilevel modeling

Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairspanels cor-relation ldquoheat mapsrdquo (corPlot) factor cluster and structural diagrams using fadiagramiclustdiagram structurediagram and hetdiagram as well as item response charac-teristics and item and test information characteristic curves plotirt and plotpoly

This vignette is meant to give an overview of the psych package That is it is meantto give a summary of the main functions in the psych package with examples of howthey are used for data description dimension reduction and scale construction The ex-tended user manual at psych_manualpdf includes examples of graphic output and moreextensive demonstrations than are found in the help menus (Also available at http

personality-projectorgrpsych_manualpdf) The vignette psych for sem atpsych_for_sempdf discusses how to use psych as a front end to the sem package of JohnFox (Fox et al 2012) (The vignette is also available at httppersonality-project

orgrbookpsych_for_sempdf)

For a step by step tutorial in the use of the psych package and the base functions inR for basic personality research see the guide for using R for personality research athttppersonalitytheoryorgrrshorthtml For an introduction to psychometrictheory with applications in R see the draft chapters at httppersonality-project

orgrbook)

2 Getting started

Some of the functions described in the Overview Vignette require other packages This isnot the case for the functions listed in this Introduction Particularly useful for rotatingthe results of factor analyses (from eg fa factorminres factorpa factorwlsor principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary

installpackages(ctv)

library(ctv)

taskviews(Psychometrics)

The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages

7

installpackages(list(c(GPArotationmnormt)

Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language

3 Basic data analysis

A number of psych functions facilitate the entry of data and finding basic descriptivestatistics

Remember to run any of the psych functions it is necessary to make the package activeby using the library command

library(psych)

The other packages once installed will be called automatically by psych

It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg

First lt- function(x) library(psych)

31 Getting the data by using readfile

Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be doneusing the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of txttext r rds rda csv xpt or sav the file will be read correctly

mydata lt- readfile()

If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command

mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row and 36 fields the first of which is 4 colums the rest of which are 1 column each

If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (comma

8

separated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)

To read SPSS files and to keep the value labels specify usevaluelabels=TRUE

myspss lt- readfile(usevaluelabels=TRUE) this will keep the value labels for sav files

32 Data input from the clipboard

There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly

readclipboard is the base function for reading data from the clipboard

readclipboardcsv for reading text that is comma delimited

readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)

readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix

readclipboardupper for reading input of an upper triangular matrix

readclipboardfwf for reading in fixed width fields (some very old data sets)

For example given a data set copied to the clipboard from a spreadsheet just enter thecommand

mydata lt- readclipboard()

This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function

gt mydata lt- readclipboard(sep=t) define the tab option or

gt mytabdata lt- readclipboardtab() just use the alternative function

For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below the

9

first variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)

gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))

33 Basic descriptive statistics

Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures

describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an

describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)

gt library(psych)

gt data(satact)

gt describe(satact) basic descriptive statistics

vars n mean sd median trimmed mad min max range skew

gender 1 700 165 048 2 168 000 1 2 1 -061

education 2 700 316 143 3 331 148 0 5 5 -068

age 3 700 2559 950 22 2386 593 13 65 52 164

ACT 4 700 2855 482 29 2884 445 3 36 33 -066

SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064

SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059

kurtosis se

gender -162 002

education -007 005

age 242 036

ACT 053 018

SATV 033 427

SATQ -002 441

These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 6) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

gt basic descriptive statistics by a grouping variable

gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

Descriptive statistics by group

group 1

vars n mean sd se

gender 1 247 100 000 000

10

education 2 247 300 154 010

age 3 247 2586 974 062

ACT 4 247 2879 506 032

SATV 5 247 61511 11416 726

SATQ 6 245 63587 11602 741

------------------------------------------------------------

group 2

vars n mean sd se

gender 1 453 200 000 000

education 2 453 326 135 006

age 3 453 2545 937 044

ACT 4 453 2842 469 022

SATV 5 453 61066 11231 528

SATQ 6 442 59600 11307 538

The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

gt samat lt- describeBy(satactlist(satact$gendersatact$education)

+ skew=FALSEranges=FALSEmat=TRUE)

gt headTail(samat)

item group1 group2 vars n mean sd se

gender1 1 1 0 1 27 1 0 0

gender2 2 2 0 1 30 2 0 0

gender3 3 1 1 1 20 1 0 0

gender4 4 2 1 1 25 2 0 0

ltNAgt ltNAgt ltNAgt

SATQ9 69 1 4 6 51 6359 10412 1458

SATQ10 70 2 4 6 86 59759 10624 1146

SATQ11 71 1 5 6 46 65783 8961 1321

SATQ12 72 2 5 6 93 60672 10555 1095

331 Outlier detection using outlier

One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

332 Basic data cleaning using scrub

If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns

11

gt png( outlierpng )

gt d2 lt- outlier(satactcex=8)

gt devoff()

null device

1

Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

12

3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

gt x lt- matrix(1120ncol=10byrow=TRUE)

gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

gt newx

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

[1] 1 2 NA NA NA 6 7 8 9 10

[2] 11 12 NA NA NA 16 17 18 19 20

[3] 21 22 NA NA NA 26 27 28 29 30

[4] 31 32 33 NA NA 36 37 38 39 40

[5] 41 42 43 44 NA 46 47 48 49 50

[6] 51 52 53 54 55 56 57 58 59 60

[7] 61 62 63 64 65 66 67 68 69 70

[8] 71 72 NA NA NA 76 77 78 79 80

[9] 81 82 NA NA NA 86 87 88 89 90

[10] 91 92 NA NA NA 96 97 98 99 100

[11] 101 102 NA NA NA 106 107 108 109 110

[12] 111 112 NA NA NA 116 117 118 119 120

Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

333 Recoding categorical variables into dummy coded variables

Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

34 Simple descriptive graphics

Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidence

13

limits (Figure 6) This may be turned off by specifying eyes=FALSE densityBy or vio-

linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 5) (Theseare sometimes called ldquolava-lamprdquo plots)

341 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 5)

14

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

15

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

16

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

+ main =Density distributions of four measures of affect )

gt devoff()

null device

1

Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

17

gt data(satact)

gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

Density Plot by gender for SAT V and Q

Obs

erve

d

SATV M SATV F SATQ M SATQ F

200

300

400

500

600

700

800

Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

18

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

19

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

095 confidence limits

Independent Variable

Dep

ende

nt V

aria

ble

bfagree bfcon bfext bfneur bfopen

050

100

150

Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

20

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

Male Female

095 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

21

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

22

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

23

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

24

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

25

data(bfi)gt png( bibarspng )

gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

gt devoff()

null device

1

Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

26

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

27

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

28

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

29

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

30

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 8: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

installpackages(list(c(GPArotationmnormt)

Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language

3 Basic data analysis

A number of psych functions facilitate the entry of data and finding basic descriptivestatistics

Remember to run any of the psych functions it is necessary to make the package activeby using the library command

library(psych)

The other packages once installed will be called automatically by psych

It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg

First lt- function(x) library(psych)

31 Getting the data by using readfile

Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be doneusing the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of txttext r rds rda csv xpt or sav the file will be read correctly

mydata lt- readfile()

If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command

mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row and 36 fields the first of which is 4 colums the rest of which are 1 column each

If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (comma

8

separated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)

To read SPSS files and to keep the value labels specify usevaluelabels=TRUE

myspss lt- readfile(usevaluelabels=TRUE) this will keep the value labels for sav files

32 Data input from the clipboard

There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly

readclipboard is the base function for reading data from the clipboard

readclipboardcsv for reading text that is comma delimited

readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)

readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix

readclipboardupper for reading input of an upper triangular matrix

readclipboardfwf for reading in fixed width fields (some very old data sets)

For example given a data set copied to the clipboard from a spreadsheet just enter thecommand

mydata lt- readclipboard()

This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function

gt mydata lt- readclipboard(sep=t) define the tab option or

gt mytabdata lt- readclipboardtab() just use the alternative function

For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below the

9

first variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)

gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))

33 Basic descriptive statistics

Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures

describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an

describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)

gt library(psych)

gt data(satact)

gt describe(satact) basic descriptive statistics

vars n mean sd median trimmed mad min max range skew

gender 1 700 165 048 2 168 000 1 2 1 -061

education 2 700 316 143 3 331 148 0 5 5 -068

age 3 700 2559 950 22 2386 593 13 65 52 164

ACT 4 700 2855 482 29 2884 445 3 36 33 -066

SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064

SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059

kurtosis se

gender -162 002

education -007 005

age 242 036

ACT 053 018

SATV 033 427

SATQ -002 441

These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 6) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

gt basic descriptive statistics by a grouping variable

gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

Descriptive statistics by group

group 1

vars n mean sd se

gender 1 247 100 000 000

10

education 2 247 300 154 010

age 3 247 2586 974 062

ACT 4 247 2879 506 032

SATV 5 247 61511 11416 726

SATQ 6 245 63587 11602 741

------------------------------------------------------------

group 2

vars n mean sd se

gender 1 453 200 000 000

education 2 453 326 135 006

age 3 453 2545 937 044

ACT 4 453 2842 469 022

SATV 5 453 61066 11231 528

SATQ 6 442 59600 11307 538

The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

gt samat lt- describeBy(satactlist(satact$gendersatact$education)

+ skew=FALSEranges=FALSEmat=TRUE)

gt headTail(samat)

item group1 group2 vars n mean sd se

gender1 1 1 0 1 27 1 0 0

gender2 2 2 0 1 30 2 0 0

gender3 3 1 1 1 20 1 0 0

gender4 4 2 1 1 25 2 0 0

ltNAgt ltNAgt ltNAgt

SATQ9 69 1 4 6 51 6359 10412 1458

SATQ10 70 2 4 6 86 59759 10624 1146

SATQ11 71 1 5 6 46 65783 8961 1321

SATQ12 72 2 5 6 93 60672 10555 1095

331 Outlier detection using outlier

One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

332 Basic data cleaning using scrub

If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns

11

gt png( outlierpng )

gt d2 lt- outlier(satactcex=8)

gt devoff()

null device

1

Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

12

3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

gt x lt- matrix(1120ncol=10byrow=TRUE)

gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

gt newx

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

[1] 1 2 NA NA NA 6 7 8 9 10

[2] 11 12 NA NA NA 16 17 18 19 20

[3] 21 22 NA NA NA 26 27 28 29 30

[4] 31 32 33 NA NA 36 37 38 39 40

[5] 41 42 43 44 NA 46 47 48 49 50

[6] 51 52 53 54 55 56 57 58 59 60

[7] 61 62 63 64 65 66 67 68 69 70

[8] 71 72 NA NA NA 76 77 78 79 80

[9] 81 82 NA NA NA 86 87 88 89 90

[10] 91 92 NA NA NA 96 97 98 99 100

[11] 101 102 NA NA NA 106 107 108 109 110

[12] 111 112 NA NA NA 116 117 118 119 120

Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

333 Recoding categorical variables into dummy coded variables

Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

34 Simple descriptive graphics

Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidence

13

limits (Figure 6) This may be turned off by specifying eyes=FALSE densityBy or vio-

linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 5) (Theseare sometimes called ldquolava-lamprdquo plots)

341 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 5)

14

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

15

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

16

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

+ main =Density distributions of four measures of affect )

gt devoff()

null device

1

Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

17

gt data(satact)

gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

Density Plot by gender for SAT V and Q

Obs

erve

d

SATV M SATV F SATQ M SATQ F

200

300

400

500

600

700

800

Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

18

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

19

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

095 confidence limits

Independent Variable

Dep

ende

nt V

aria

ble

bfagree bfcon bfext bfneur bfopen

050

100

150

Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

20

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

Male Female

095 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

21

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

22

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

23

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

24

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

25

data(bfi)gt png( bibarspng )

gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

gt devoff()

null device

1

Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

26

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

27

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

28

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

29

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

30

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 9: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

separated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)

To read SPSS files and to keep the value labels specify usevaluelabels=TRUE

myspss lt- readfile(usevaluelabels=TRUE) this will keep the value labels for sav files

32 Data input from the clipboard

There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly

readclipboard is the base function for reading data from the clipboard

readclipboardcsv for reading text that is comma delimited

readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)

readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix

readclipboardupper for reading input of an upper triangular matrix

readclipboardfwf for reading in fixed width fields (some very old data sets)

For example given a data set copied to the clipboard from a spreadsheet just enter thecommand

mydata lt- readclipboard()

This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function

gt mydata lt- readclipboard(sep=t) define the tab option or

gt mytabdata lt- readclipboardtab() just use the alternative function

For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below the

9

first variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)

gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))

33 Basic descriptive statistics

Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures

describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an

describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)

gt library(psych)

gt data(satact)

gt describe(satact) basic descriptive statistics

vars n mean sd median trimmed mad min max range skew

gender 1 700 165 048 2 168 000 1 2 1 -061

education 2 700 316 143 3 331 148 0 5 5 -068

age 3 700 2559 950 22 2386 593 13 65 52 164

ACT 4 700 2855 482 29 2884 445 3 36 33 -066

SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064

SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059

kurtosis se

gender -162 002

education -007 005

age 242 036

ACT 053 018

SATV 033 427

SATQ -002 441

These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 6) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

gt basic descriptive statistics by a grouping variable

gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

Descriptive statistics by group

group 1

vars n mean sd se

gender 1 247 100 000 000

10

education 2 247 300 154 010

age 3 247 2586 974 062

ACT 4 247 2879 506 032

SATV 5 247 61511 11416 726

SATQ 6 245 63587 11602 741

------------------------------------------------------------

group 2

vars n mean sd se

gender 1 453 200 000 000

education 2 453 326 135 006

age 3 453 2545 937 044

ACT 4 453 2842 469 022

SATV 5 453 61066 11231 528

SATQ 6 442 59600 11307 538

The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

gt samat lt- describeBy(satactlist(satact$gendersatact$education)

+ skew=FALSEranges=FALSEmat=TRUE)

gt headTail(samat)

item group1 group2 vars n mean sd se

gender1 1 1 0 1 27 1 0 0

gender2 2 2 0 1 30 2 0 0

gender3 3 1 1 1 20 1 0 0

gender4 4 2 1 1 25 2 0 0

ltNAgt ltNAgt ltNAgt

SATQ9 69 1 4 6 51 6359 10412 1458

SATQ10 70 2 4 6 86 59759 10624 1146

SATQ11 71 1 5 6 46 65783 8961 1321

SATQ12 72 2 5 6 93 60672 10555 1095

331 Outlier detection using outlier

One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

332 Basic data cleaning using scrub

If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns

11

gt png( outlierpng )

gt d2 lt- outlier(satactcex=8)

gt devoff()

null device

1

Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

12

3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

gt x lt- matrix(1120ncol=10byrow=TRUE)

gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

gt newx

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

[1] 1 2 NA NA NA 6 7 8 9 10

[2] 11 12 NA NA NA 16 17 18 19 20

[3] 21 22 NA NA NA 26 27 28 29 30

[4] 31 32 33 NA NA 36 37 38 39 40

[5] 41 42 43 44 NA 46 47 48 49 50

[6] 51 52 53 54 55 56 57 58 59 60

[7] 61 62 63 64 65 66 67 68 69 70

[8] 71 72 NA NA NA 76 77 78 79 80

[9] 81 82 NA NA NA 86 87 88 89 90

[10] 91 92 NA NA NA 96 97 98 99 100

[11] 101 102 NA NA NA 106 107 108 109 110

[12] 111 112 NA NA NA 116 117 118 119 120

Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

333 Recoding categorical variables into dummy coded variables

Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

34 Simple descriptive graphics

Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidence

13

limits (Figure 6) This may be turned off by specifying eyes=FALSE densityBy or vio-

linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 5) (Theseare sometimes called ldquolava-lamprdquo plots)

341 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 5)

14

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

15

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

16

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

+ main =Density distributions of four measures of affect )

gt devoff()

null device

1

Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

17

gt data(satact)

gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

Density Plot by gender for SAT V and Q

Obs

erve

d

SATV M SATV F SATQ M SATQ F

200

300

400

500

600

700

800

Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

18

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

19

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

095 confidence limits

Independent Variable

Dep

ende

nt V

aria

ble

bfagree bfcon bfext bfneur bfopen

050

100

150

Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

20

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

Male Female

095 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

21

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

22

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

23

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

24

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

25

data(bfi)gt png( bibarspng )

gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

gt devoff()

null device

1

Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

26

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

27

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

28

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

29

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

30

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 10: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

first variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)

gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))

33 Basic descriptive statistics

Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures

describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an

describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)

gt library(psych)

gt data(satact)

gt describe(satact) basic descriptive statistics

vars n mean sd median trimmed mad min max range skew

gender 1 700 165 048 2 168 000 1 2 1 -061

education 2 700 316 143 3 331 148 0 5 5 -068

age 3 700 2559 950 22 2386 593 13 65 52 164

ACT 4 700 2855 482 29 2884 445 3 36 33 -066

SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064

SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059

kurtosis se

gender -162 002

education -007 005

age 242 036

ACT 053 018

SATV 033 427

SATQ -002 441

These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 6) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

gt basic descriptive statistics by a grouping variable

gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

Descriptive statistics by group

group 1

vars n mean sd se

gender 1 247 100 000 000

10

education 2 247 300 154 010

age 3 247 2586 974 062

ACT 4 247 2879 506 032

SATV 5 247 61511 11416 726

SATQ 6 245 63587 11602 741

------------------------------------------------------------

group 2

vars n mean sd se

gender 1 453 200 000 000

education 2 453 326 135 006

age 3 453 2545 937 044

ACT 4 453 2842 469 022

SATV 5 453 61066 11231 528

SATQ 6 442 59600 11307 538

The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

gt samat lt- describeBy(satactlist(satact$gendersatact$education)

+ skew=FALSEranges=FALSEmat=TRUE)

gt headTail(samat)

item group1 group2 vars n mean sd se

gender1 1 1 0 1 27 1 0 0

gender2 2 2 0 1 30 2 0 0

gender3 3 1 1 1 20 1 0 0

gender4 4 2 1 1 25 2 0 0

ltNAgt ltNAgt ltNAgt

SATQ9 69 1 4 6 51 6359 10412 1458

SATQ10 70 2 4 6 86 59759 10624 1146

SATQ11 71 1 5 6 46 65783 8961 1321

SATQ12 72 2 5 6 93 60672 10555 1095

331 Outlier detection using outlier

One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

332 Basic data cleaning using scrub

If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns

11

gt png( outlierpng )

gt d2 lt- outlier(satactcex=8)

gt devoff()

null device

1

Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

12

3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

gt x lt- matrix(1120ncol=10byrow=TRUE)

gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

gt newx

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

[1] 1 2 NA NA NA 6 7 8 9 10

[2] 11 12 NA NA NA 16 17 18 19 20

[3] 21 22 NA NA NA 26 27 28 29 30

[4] 31 32 33 NA NA 36 37 38 39 40

[5] 41 42 43 44 NA 46 47 48 49 50

[6] 51 52 53 54 55 56 57 58 59 60

[7] 61 62 63 64 65 66 67 68 69 70

[8] 71 72 NA NA NA 76 77 78 79 80

[9] 81 82 NA NA NA 86 87 88 89 90

[10] 91 92 NA NA NA 96 97 98 99 100

[11] 101 102 NA NA NA 106 107 108 109 110

[12] 111 112 NA NA NA 116 117 118 119 120

Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

333 Recoding categorical variables into dummy coded variables

Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

34 Simple descriptive graphics

Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidence

13

limits (Figure 6) This may be turned off by specifying eyes=FALSE densityBy or vio-

linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 5) (Theseare sometimes called ldquolava-lamprdquo plots)

341 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 5)

14

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

15

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

16

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

+ main =Density distributions of four measures of affect )

gt devoff()

null device

1

Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

17

gt data(satact)

gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

Density Plot by gender for SAT V and Q

Obs

erve

d

SATV M SATV F SATQ M SATQ F

200

300

400

500

600

700

800

Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

18

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

19

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

095 confidence limits

Independent Variable

Dep

ende

nt V

aria

ble

bfagree bfcon bfext bfneur bfopen

050

100

150

Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

20

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

Male Female

095 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

21

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

22

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

23

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

24

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

25

data(bfi)gt png( bibarspng )

gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

gt devoff()

null device

1

Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

26

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

27

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

28

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

29

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

30

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 11: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

education 2 247 300 154 010

age 3 247 2586 974 062

ACT 4 247 2879 506 032

SATV 5 247 61511 11416 726

SATQ 6 245 63587 11602 741

------------------------------------------------------------

group 2

vars n mean sd se

gender 1 453 200 000 000

education 2 453 326 135 006

age 3 453 2545 937 044

ACT 4 453 2842 469 022

SATV 5 453 61066 11231 528

SATQ 6 442 59600 11307 538

The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

gt samat lt- describeBy(satactlist(satact$gendersatact$education)

+ skew=FALSEranges=FALSEmat=TRUE)

gt headTail(samat)

item group1 group2 vars n mean sd se

gender1 1 1 0 1 27 1 0 0

gender2 2 2 0 1 30 2 0 0

gender3 3 1 1 1 20 1 0 0

gender4 4 2 1 1 25 2 0 0

ltNAgt ltNAgt ltNAgt

SATQ9 69 1 4 6 51 6359 10412 1458

SATQ10 70 2 4 6 86 59759 10624 1146

SATQ11 71 1 5 6 46 65783 8961 1321

SATQ12 72 2 5 6 93 60672 10555 1095

331 Outlier detection using outlier

One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

332 Basic data cleaning using scrub

If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns

11

gt png( outlierpng )

gt d2 lt- outlier(satactcex=8)

gt devoff()

null device

1

Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

12

3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

gt x lt- matrix(1120ncol=10byrow=TRUE)

gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

gt newx

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

[1] 1 2 NA NA NA 6 7 8 9 10

[2] 11 12 NA NA NA 16 17 18 19 20

[3] 21 22 NA NA NA 26 27 28 29 30

[4] 31 32 33 NA NA 36 37 38 39 40

[5] 41 42 43 44 NA 46 47 48 49 50

[6] 51 52 53 54 55 56 57 58 59 60

[7] 61 62 63 64 65 66 67 68 69 70

[8] 71 72 NA NA NA 76 77 78 79 80

[9] 81 82 NA NA NA 86 87 88 89 90

[10] 91 92 NA NA NA 96 97 98 99 100

[11] 101 102 NA NA NA 106 107 108 109 110

[12] 111 112 NA NA NA 116 117 118 119 120

Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

333 Recoding categorical variables into dummy coded variables

Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

34 Simple descriptive graphics

Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidence

13

limits (Figure 6) This may be turned off by specifying eyes=FALSE densityBy or vio-

linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 5) (Theseare sometimes called ldquolava-lamprdquo plots)

341 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 5)

14

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

15

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

16

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

+ main =Density distributions of four measures of affect )

gt devoff()

null device

1

Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

17

gt data(satact)

gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

Density Plot by gender for SAT V and Q

Obs

erve

d

SATV M SATV F SATQ M SATQ F

200

300

400

500

600

700

800

Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

18

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

19

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

095 confidence limits

Independent Variable

Dep

ende

nt V

aria

ble

bfagree bfcon bfext bfneur bfopen

050

100

150

Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

20

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

Male Female

095 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

21

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

22

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

23

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

24

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

25

data(bfi)gt png( bibarspng )

gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

gt devoff()

null device

1

Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

26

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

27

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

28

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

29

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

30

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 12: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

gt png( outlierpng )

gt d2 lt- outlier(satactcex=8)

gt devoff()

null device

1

Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

12

3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

gt x lt- matrix(1120ncol=10byrow=TRUE)

gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

gt newx

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

[1] 1 2 NA NA NA 6 7 8 9 10

[2] 11 12 NA NA NA 16 17 18 19 20

[3] 21 22 NA NA NA 26 27 28 29 30

[4] 31 32 33 NA NA 36 37 38 39 40

[5] 41 42 43 44 NA 46 47 48 49 50

[6] 51 52 53 54 55 56 57 58 59 60

[7] 61 62 63 64 65 66 67 68 69 70

[8] 71 72 NA NA NA 76 77 78 79 80

[9] 81 82 NA NA NA 86 87 88 89 90

[10] 91 92 NA NA NA 96 97 98 99 100

[11] 101 102 NA NA NA 106 107 108 109 110

[12] 111 112 NA NA NA 116 117 118 119 120

Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

333 Recoding categorical variables into dummy coded variables

Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

34 Simple descriptive graphics

Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidence

13

limits (Figure 6) This may be turned off by specifying eyes=FALSE densityBy or vio-

linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 5) (Theseare sometimes called ldquolava-lamprdquo plots)

341 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 5)

14

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

15

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

16

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

+ main =Density distributions of four measures of affect )

gt devoff()

null device

1

Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

17

gt data(satact)

gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

Density Plot by gender for SAT V and Q

Obs

erve

d

SATV M SATV F SATQ M SATQ F

200

300

400

500

600

700

800

Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

18

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

19

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

095 confidence limits

Independent Variable

Dep

ende

nt V

aria

ble

bfagree bfcon bfext bfneur bfopen

050

100

150

Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

20

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

Male Female

095 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

21

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

22

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

23

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

24

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

25

data(bfi)gt png( bibarspng )

gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

gt devoff()

null device

1

Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

26

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

27

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

28

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

29

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

30

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 13: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

gt x lt- matrix(1120ncol=10byrow=TRUE)

gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

gt newx

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

[1] 1 2 NA NA NA 6 7 8 9 10

[2] 11 12 NA NA NA 16 17 18 19 20

[3] 21 22 NA NA NA 26 27 28 29 30

[4] 31 32 33 NA NA 36 37 38 39 40

[5] 41 42 43 44 NA 46 47 48 49 50

[6] 51 52 53 54 55 56 57 58 59 60

[7] 61 62 63 64 65 66 67 68 69 70

[8] 71 72 NA NA NA 76 77 78 79 80

[9] 81 82 NA NA NA 86 87 88 89 90

[10] 91 92 NA NA NA 96 97 98 99 100

[11] 101 102 NA NA NA 106 107 108 109 110

[12] 111 112 NA NA NA 116 117 118 119 120

Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

333 Recoding categorical variables into dummy coded variables

Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

34 Simple descriptive graphics

Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidence

13

limits (Figure 6) This may be turned off by specifying eyes=FALSE densityBy or vio-

linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 5) (Theseare sometimes called ldquolava-lamprdquo plots)

341 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 5)

14

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

15

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

16

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

+ main =Density distributions of four measures of affect )

gt devoff()

null device

1

Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

17

gt data(satact)

gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

Density Plot by gender for SAT V and Q

Obs

erve

d

SATV M SATV F SATQ M SATQ F

200

300

400

500

600

700

800

Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

18

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

19

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

095 confidence limits

Independent Variable

Dep

ende

nt V

aria

ble

bfagree bfcon bfext bfneur bfopen

050

100

150

Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

20

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

Male Female

095 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

21

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

22

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

23

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

24

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

25

data(bfi)gt png( bibarspng )

gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

gt devoff()

null device

1

Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

26

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

27

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

28

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

29

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

30

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 14: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

limits (Figure 6) This may be turned off by specifying eyes=FALSE densityBy or vio-

linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 5) (Theseare sometimes called ldquolava-lamprdquo plots)

341 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 5)

14

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

15

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

16

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

+ main =Density distributions of four measures of affect )

gt devoff()

null device

1

Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

17

gt data(satact)

gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

Density Plot by gender for SAT V and Q

Obs

erve

d

SATV M SATV F SATQ M SATQ F

200

300

400

500

600

700

800

Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

18

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

19

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

095 confidence limits

Independent Variable

Dep

ende

nt V

aria

ble

bfagree bfcon bfext bfneur bfopen

050

100

150

Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

20

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

Male Female

095 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

21

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

22

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

23

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

24

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

25

data(bfi)gt png( bibarspng )

gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

gt devoff()

null device

1

Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

26

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

27

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

28

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

29

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

30

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 15: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

15

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

16

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

+ main =Density distributions of four measures of affect )

gt devoff()

null device

1

Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

17

gt data(satact)

gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

Density Plot by gender for SAT V and Q

Obs

erve

d

SATV M SATV F SATQ M SATQ F

200

300

400

500

600

700

800

Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

18

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

19

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

095 confidence limits

Independent Variable

Dep

ende

nt V

aria

ble

bfagree bfcon bfext bfneur bfopen

050

100

150

Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

20

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

Male Female

095 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

21

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

22

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

23

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

24

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

25

data(bfi)gt png( bibarspng )

gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

gt devoff()

null device

1

Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

26

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

27

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

28

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

29

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

30

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 16: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

16

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

+ main =Density distributions of four measures of affect )

gt devoff()

null device

1

Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

17

gt data(satact)

gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

Density Plot by gender for SAT V and Q

Obs

erve

d

SATV M SATV F SATQ M SATQ F

200

300

400

500

600

700

800

Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

18

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

19

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

095 confidence limits

Independent Variable

Dep

ende

nt V

aria

ble

bfagree bfcon bfext bfneur bfopen

050

100

150

Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

20

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

Male Female

095 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

21

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

22

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

23

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

24

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

25

data(bfi)gt png( bibarspng )

gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

gt devoff()

null device

1

Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

26

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

27

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

28

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

29

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

30

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 17: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

+ main =Density distributions of four measures of affect )

gt devoff()

null device

1

Figure 4 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The variablesare four measures of motivational state for 3896 participants Each scale is the averagescore of 10 items measuring motivational state Compare this a plot with smoother set toFALSE

17

gt data(satact)

gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

Density Plot by gender for SAT V and Q

Obs

erve

d

SATV M SATV F SATQ M SATQ F

200

300

400

500

600

700

800

Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

18

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

19

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

095 confidence limits

Independent Variable

Dep

ende

nt V

aria

ble

bfagree bfcon bfext bfneur bfopen

050

100

150

Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

20

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

Male Female

095 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

21

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

22

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

23

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

24

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

25

data(bfi)gt png( bibarspng )

gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

gt devoff()

null device

1

Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

26

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

27

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

28

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

29

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

30

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 18: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

gt data(satact)

gt violinBy(satact[56]satact$gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

Density Plot by gender for SAT V and Q

Obs

erve

d

SATV M SATV F SATQ M SATQ F

200

300

400

500

600

700

800

Figure 5 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

18

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

19

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

095 confidence limits

Independent Variable

Dep

ende

nt V

aria

ble

bfagree bfcon bfext bfneur bfopen

050

100

150

Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

20

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

Male Female

095 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

21

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

22

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

23

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

24

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

25

data(bfi)gt png( bibarspng )

gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

gt devoff()

null device

1

Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

26

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

27

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

28

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

29

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

30

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 19: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 6) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 8) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in presentingdata this way at httpemdbolkerwikidotcomblogdynamite In the example shownnote that the graph starts at 0 although is out of the range This is a function of usingbars which always are assumed to start at zero Consider other ways of showing yourdata

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

19

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

095 confidence limits

Independent Variable

Dep

ende

nt V

aria

ble

bfagree bfcon bfext bfneur bfopen

050

100

150

Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

20

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

Male Female

095 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

21

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

22

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

23

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

24

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

25

data(bfi)gt png( bibarspng )

gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

gt devoff()

null device

1

Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

26

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

27

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

28

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

29

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

30

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 20: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

095 confidence limits

Independent Variable

Dep

ende

nt V

aria

ble

bfagree bfcon bfext bfneur bfopen

050

100

150

Figure 6 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

20

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

Male Female

095 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

21

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

22

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

23

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

24

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

25

data(bfi)gt png( bibarspng )

gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

gt devoff()

null device

1

Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

26

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

27

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

28

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

29

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

30

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 21: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

Male Female

095 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 7 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

21

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

22

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

23

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

24

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

25

data(bfi)gt png( bibarspng )

gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

gt devoff()

null device

1

Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

26

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

27

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

28

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

29

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

30

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 22: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 8 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

22

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

23

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

24

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

25

data(bfi)gt png( bibarspng )

gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

gt devoff()

null device

1

Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

26

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

27

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

28

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

29

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

30

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 23: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

23

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

24

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

25

data(bfi)gt png( bibarspng )

gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

gt devoff()

null device

1

Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

26

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

27

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

28

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

29

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

30

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 24: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 9 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

24

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

25

data(bfi)gt png( bibarspng )

gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

gt devoff()

null device

1

Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

26

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

27

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

28

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

29

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

30

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 25: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 10)

25

data(bfi)gt png( bibarspng )

gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

gt devoff()

null device

1

Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

26

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

27

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

28

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

29

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

30

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 26: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

data(bfi)gt png( bibarspng )

gt with(bfibibars(agegenderylab=Agemain=Age by males and females))

gt devoff()

null device

1

Figure 10 A bar plot of the age distribution for males and females shows the use ofbibars The data are males and females from 2800 cases collected using the SAPAprocedure and are available as part of the bfi data set

26

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

27

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

28

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

29

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

30

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 27: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

27

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

28

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

29

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

30

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 28: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)

28

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

29

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

30

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 29: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

29

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

30

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 30: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

30

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 31: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

31

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 32: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

32

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 33: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

33

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 34: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in termsof dichotomizing the bivariate normal density function using the drawcor function (Fig-ure 15) A simple generalization of this to the case of the multiple cuts is the polychoric

correlation

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular use

34

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 35: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 033

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 36: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

gt drawcor(expand=20cuts=c(00))

xy

z

Bivariate density rho = 05

Figure 15 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values It isfound (laboriously) by optimizing the fit of the bivariate normal for various values of thecorrelation to the observed cell frequencies

36

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 37: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

is the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler functionto give some of the basic descriptive statistics for two level models

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

37

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 38: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

Sentences 009 007 025 021 020

Vocabulary 009 017 009 016 -002

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

38

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 39: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

069 063 050 058

LetterGroup

048

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

048 040 025 034

LetterGroup

023

Multiple Inflation Factor (VIF) = 1(1-SMC) =

Sentences Vocabulary SentCompletion FirstLetters

369 388 300 135

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

059 058 049 058

LetterGroup

045

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

034 034 024 033

LetterGroup

020

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

Beta weights

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

SentCompletion 002 005 004 021 008

FirstLetters 058 045 021 008 031

Multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

058 046 021 018

LetterGroup

030

39

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 40: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

0331 0210 0043 0032

LetterGroup

0092

Multiple Inflation Factor (VIF) = 1(1-SMC) =

SentCompletion FirstLetters

102 102

Unweighted multiple R

FourLetterWords Suffixes LetterSeries Pedigrees

044 035 017 014

LetterGroup

026

Unweighted multiple R2

FourLetterWords Suffixes LetterSeries Pedigrees

019 012 003 002

LetterGroup

007

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 048

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees

FourLetterWords 052 011 009 006

Suffixes 011 060 -001 001

LetterSeries 009 -001 075 028

Pedigrees 006 001 028 066

LetterGroup 013 003 037 020

LetterGroup

FourLetterWords 013

Suffixes 003

LetterSeries 037

Pedigrees 020

LetterGroup 077

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping

40

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 41: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

Call mediate(y = SATIS x = THERAPY m = ATTRIB data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total Direct effect(c) of THERAPY on SATIS = 076 SE = 031 t direct = 25 with probability = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t direct = 135 with probability = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 004 Upper CI = 069

R2 of model = 031

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATIS se t Prob

THERAPY 076 031 25 00186

Direct effect estimates (c)SATIS se t Prob

THERAPY 043 032 135 0190

ATTRIB 040 018 223 0034

a effect estimates

THERAPY se t Prob

ATTRIB 082 03 274 00106

b effect estimates

SATIS se t Prob

ATTRIB 04 018 223 0034

ab effect estimates

SATIS boot sd lower upper

THERAPY 033 032 017 004 069

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(y = c( SATV SATQ) x = c(education age ) data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate(y = c( SATV ) x = c(education age ) m= ACT data =satactstd=TRUEniter=50)

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It then tests the mediation effect using a bootstrap The particular example is for demonstration purposes only and shows neithermoderation nor mediation The number of iterations for the boot strap was set to 50

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 42: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

082

c = 076

c = 043

04

Figure 16 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 43: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

gt preacher lt- setCor(1c(23)sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 17 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 44: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

for speed The default number of boot straps is 5000

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

44

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 45: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

Call mediate(y = c(SATQ) x = c(ACT) m = education data = satact

mod = gender niter = 50 std = TRUE)

The DV (Y) was SATQ The IV (X) was ACT gender ACTXgndr The mediating variable(s) = education

Total Direct effect(c) of ACT on SATQ = 058 SE = 003 t direct = 1925 with probability = 0

Direct effect (c) of ACT on SATQ removing education = 059 SE = 003 t direct = 1926 with probability = 0

Indirect effect (ab) of ACT on SATQ through education = -001

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -002 Upper CI = 0

Total Direct effect(c) of gender on SATQ = -014 SE = 003 t direct = -478 with probability = 21e-06

Direct effect (c) of gender on NA removing education = -014 SE = 003 t direct = -463 with probability = 44e-06

Indirect effect (ab) of gender on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = -001 Upper CI = 0

Total Direct effect(c) of ACTXgndr on SATQ = 0 SE = 003 t direct = 002 with probability = 099

Direct effect (c) of ACTXgndr on NA removing education = 0 SE = 003 t direct = 001 with probability = 099

Indirect effect (ab) of ACTXgndr on SATQ through education = 0

Mean bootstrapped indirect effect = -001 with standard error = 001 Lower CI = 0 Upper CI = 0

R2 of model = 037

To see the longer output specify short = FALSE in the print statement

Full output

Total effect estimates (c)

SATQ se t Prob

ACT 058 003 1925 000e+00

gender -014 003 -478 210e-06

ACTXgndr 000 003 002 985e-01

Direct effect estimates (c)SATQ se t Prob

ACT 059 003 1926 000e+00

gender -014 003 -463 437e-06

ACTXgndr 000 003 001 992e-01

a effect estimates

education se t Prob

ACT 016 004 422 277e-05

gender 009 004 250 128e-02

ACTXgndr -001 004 -015 883e-01

b effect estimates

SATQ se t Prob

education -004 003 -145 0147

ab effect estimates

SATQ boot sd lower upper

ACT -001 -001 001 0 0

gender 000 000 000 0 0

ACTXgndr 000 000 000 0 0

Moderation model

ACT

gender

ACTXgndr

SATQ

education016 c = 058

c = 059

009 c = minus014

c = minus014

minus001 c = 0

c = 0

minus004

minus004

minus007

002

Figure 18 Moderated multiple regression requires the raw data

45

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 46: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from matrix input

Beta weights

ACT SATV SATQ

gender -005 -003 -018

education 014 010 010

age 003 -010 -009

Multiple R

ACT SATV SATQ

016 010 019

multiple R2

ACT SATV SATQ

00272 00096 00359

Multiple Inflation Factor (VIF) = 1(1-SMC) =

gender education age

101 145 144

Unweighted multiple R

ACT SATV SATQ

015 005 011

Unweighted multiple R2

ACT SATV SATQ

002 000 001

SE of Beta weights

ACT SATV SATQ

gender 018 429 434

education 022 513 518

age 022 511 516

t of Beta Weights

ACT SATV SATQ

gender -027 -001 -004

education 065 002 002

46

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 47: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

age 015 -002 -002

Probability of t lt

ACT SATV SATQ

gender 079 099 097

education 051 098 098

age 088 098 099

Shrunken R2

ACT SATV SATQ

00230 00054 00317

Standard Error of R2

ACT SATV SATQ

00120 00073 00137

F

ACT SATV SATQ

649 226 863

Probability of F lt

ACT SATV SATQ

248e-04 808e-02 124e-05

degrees of freedom of regression

[1] 3 696

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0050 0033 0008

Chisq of canonical correlations

[1] 358 231 56

Average squared canonical correlation = 003

Cohens Set Correlation R2 = 009

Shrunken Set Correlation R2 = 008

F and df of Cohens Set Correlation 726 9 168186

Unweighted correlation between the two sets = 001

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenient

47

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 48: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

LATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

48

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 49: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

49

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 50: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in thepsych package These include six data sets showing a hierarchical factor structure (fivecognitive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2and one from health psychology Reise) One of these (Thurstone) is used as an examplein the sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) or 13 personality inventory scores (epibfi) and14 multiple choice iq items (iqitems) The vegetables example has paired comparisonpreferences for 9 vegetables This is an example of Thurstonian scaling used by Guilford(1954) and Nunnally (1967) Other data sets include cubits peas and heights fromGalton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrixfrom their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

50

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 51: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

iq 14 multiple choice ability items were included as part of the Synthetic Aperture Person-ality Assessment (SAPA) web based personality assessment project The data from1000 subjects are included here as a demonstration set for scoring multiple choiceinventories and doing basic item statistics

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to ntroduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httppersonality-projectorgr That version will have removed the mostrecently discovered bugs (but perhaps introduced other yet to be discovered ones) Todownload that version go to the repository httppersonality-projectorgrsrc

contrib and wander around For a Mac this version can be installed directly using theldquoother repositoryrdquo option in the package installer For a PC the zip file for the most recentrelease has been created using the win-builder facility at CRAN The development releasefor the Mac is usually several weeks ahead of the PC development version

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httppersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt 170package=psych)

51

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 52: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many ofthe functions were developed to supplement a book (httppersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep)More information about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttppersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R Under development (unstable) (2017-03-05 r72309)

Platform x86_64-apple-darwin1340 (64-bit)

Running under macOS Sierra 10124

Matrix products default

BLAS LibraryFrameworksRframeworkVersions34ResourcesliblibRblas0dylib

LAPACK LibraryFrameworksRframeworkVersions34ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psych_17421

loaded via a namespace (and not attached)

[1] compiler_340 parallel_340 tools_340 foreign_08-67

[5] KernSmooth_223-15 nlme_31-131 mnormt_15-4 grid_340

[9] lattice_020-34

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 53: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

53

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 54: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

54

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 55: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2015) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 158

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook ofIndividual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bang

55

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 56: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

for the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

56

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 57: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

Index

affect 14 24alpha 5 6

Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26bifactor 6biserial 13 34blockrandom 49burt 34

char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49circular statistics 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49ctv 7cubits 50

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49diagram 8drawcor 34drawtetra 34dummycode 13

dynamite plot 19

edit 3epibfi 50error bars 19errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23

fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factor analysis 6factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49

galton 51generalized least squares 6geometricmean 49GPArotation 7guttman 6

harmonicmean 49head 49headtail 49heights 50hetdiagram 7Hmisc 28Holzinger 50

57

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 58: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

ICC 6 49iclust 6iclustdiagram 7Index 49introduction to psychometric theory with ap-

plications in R 7iqitems 50irtfa 6 47irt2latex 47 49

KnitR 47

lavaan 38library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27lowess 14

makekeys 14MAP 6mardia 49maximum likelihood 6mediate 4 41 42mediatediagram 41minimum residual 6mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevel 37multilevelreliability 6multiple regression 38

nfactors 6nlme 37

omega 6 7outlier 3 11 12

padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7principal axis 6psych 3 5ndash8 28 47 49ndash52

R functionaffect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49cor 27corsmooth 34cortest 28cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50

58

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 59: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13edit 3epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7filechoose 8fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49head 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47

irt2latex 47 49library 8lm 38lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12padjust 28prep 49pairs 14pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34polyserial 34principal 5ndash7psych 51psych package

affect 14alpha 5 6Bechtoldt1 50Bechtoldt2 50bfi 26 50bibars 6 25 26

59

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 60: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

biserial 13 34blockrandom 49burt 34char2numeric 13circadiancor 49circadianlinearcor 49circadianmean 49cities 51cohenkappa 49corsmooth 34cor2latex 47 49corPlot 7corrp 28 32corrtest 28 32cortest 33cosinor 49cubits 50densityBy 14describe 6 10 49describeBy 3 6 10 11df2latex 48 49drawcor 34drawtetra 34dummycode 13epibfi 50errorbars 6 13 19errorbarsby 10 13 19 20errorbarstab 19errorcrosses 19errorCircles 24errorCrosses 23fa 6 7 48fadiagram 7faextension 51famulti 6faparallel 5 6fa2latex 47 49faBy 38factorminres 7factorpa 7factorwls 7

fisherz 49galton 51geometricmean 49guttman 6harmonicmean 49headtail 49heights 50hetdiagram 7Holzinger 50ICC 6 49iclust 6iclustdiagram 7iqitems 50irtfa 6 47irt2latex 47 49lowerCor 4 27lowerMat 27lowerUpper 27makekeys 14MAP 6mardia 49mediate 4 41 42mediatediagram 41mixedcor 34mlArrange 7mlPlot 7mlr 6msq 14multihist 6multilevelreliability 6nfactors 6omega 6 7outlier 3 11 12prep 49pairspanels 3 6 7 12ndash17partialr 49pca 6peas 50 51plotirt 7plotpoly 7polychoric 6 34

60

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 61: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

polyserial 34principal 5ndash7psych 51rtest 28rangeCorrection 49readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

rtest 28

rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43setcor 49setCor 4 38 41 44 46 47simmultilevel 37spider 13stars 14StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50table 19tail 49tetrachoric 6 34Thurstone 28 50Thurstone33 50topBottom 49vegetables 50 51violinBy 14 18vss 5 6withinBetween 37

R package

61

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 62: An introduction to the psych package: Part I: data entry ...data frames to long data frames suitable for multilevel modeling. Graphical displays include Scatter Plot Matrix (SPLOM)

ctv 7GPArotation 7Hmisc 28KnitR 47lavaan 38multilevel 37nlme 37psych 3 5ndash8 28 47 49ndash52Rgraphviz 8sem 7 50stats 28Sweave 47xtable 47

rtest 28rangeCorrection 49rcorr 28readclipboard 3 6 8 9readclipboardcsv 9readclipboardfwf 9readclipboardlower 9readclipboardtab 3 9readclipboardupper 9readfile 3 6 8readtable 9Reise 50reversecode 49Rgraphviz 8

SAPA 26 50 51satact 10 33 44scatterhist 6schmid 6 7scoremultiplechoice 6scoreItems 5 6 14scrub 3 11sector 43sem 7 50set correlation 44setcor 49setCor 4 38 41 44 46 47simmultilevel 37

spider 13stars 14stats 28StatsBy 6statsBy 6 37 38statsByboot 37statsBybootsummary 37structurediagram 7superMatrix 50Sweave 47

table 19tail 49tetrachoric 6 34Thurstone 28 38 50Thurstone33 50topBottom 49

vegetables 50 51violinBy 14 18vss 5 6

weighted least squares 6withinBetween 37

xtable 47

62

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo